...

Properties of joint distributions

by taratuta

on
Category: Documents
54

views

Report

Comments

Transcript

Properties of joint distributions
30.12 PROPERTIES OF JOINT DISTRIBUTIONS
30.11.3 Marginal and conditional distributions
Given a bivariate distribution f(x, y), we may be interested only in the probability function for X irrespective of the value of Y (or vice versa). This marginal
distribution of X is obtained by summing or integrating, as appropriate, the
joint probability distribution over all allowed values of Y . Thus, the marginal
distribution of X (for example) is given by
#
f(x, yj ) for a discrete distribution,
(30.130)
fX (x) = j
f(x, y) dy for a continuous distribution.
It is clear that an analogous definition exists for the marginal distribution of Y .
Alternatively, one might be interested in the probability function of X given
that Y takes some specific value of Y = y0 , i.e. Pr(X = x|Y = y0 ). This conditional
distribution of X is given by
g(x) =
f(x, y0 )
,
fY (y0 )
where fY (y) is the marginal distribution of Y . The division by fY (y0 ) is necessary
in order that g(x) is properly normalised.
30.12 Properties of joint distributions
The probability density function f(x, y) contains all the information on the joint
probability distribution of two random variables X and Y . In a similar manner
to that presented for univariate distributions, however, it is conventional to
characterise f(x, y) by certain of its properties, which we now discuss. Once
again, most of these properties are based on the concept of expectation values,
which are defined for joint distributions in an analogous way to those for singlevariable distributions (30.46). Thus, the expectation value of any function g(X, Y )
of the random variables X and Y is given by
# for the discrete case,
j g(xi , yj )f(xi , yj )
E[g(X, Y )] = ∞i ∞
g(x,
y)f(x,
y)
dx
dy
for
the continuous case.
−∞ −∞
30.12.1 Means
The means of X and Y are defined respectively as the expectation values of the
variables X and Y . Thus, the mean of X is given by
# for the discrete case,
j xi f(xi , yj )
E[X] = µX = ∞i ∞
xf(x,
y)
dx
dy
for
the continuous case. (30.131)
−∞ −∞
E[Y ] is obtained in a similar manner.
1199
PROBABILITY
Show that if X and Y are independent random variables then E[XY ] = E[X]E[Y ].
Let us consider the case where X and Y are continuous random variables. Since X and
Y are independent f(x, y) = fX (x)fY (y), so that
∞
∞
∞ ∞
xyfX (x)fY (y) dx dy =
xfX (x) dx
yfY (y) dy = E[X]E[Y ].
E[XY ] =
−∞
−∞
−∞
−∞
An analogous proof exists for the discrete case. 30.12.2 Variances
The definitions of the variances of X and Y are analogous to those for the
single-variable case (30.48), i.e. the variance of X is given by
# 2
for the discrete case,
j (xi − µX ) f(xi , yj )
2
V [X] = σX = ∞i ∞
2
−∞ −∞ (x − µX ) f(x, y) dx dy for the continuous case. (30.132)
Equivalent definitions exist for the variance of Y .
30.12.3 Covariance and correlation
Means and variances of joint distributions provide useful information about
their marginal distributions, but we have not yet given any indication of how to
measure the relationship between the two random variables. Of course, it may
be that the two random variables are independent, but often this is not so. For
example, if we measure the heights and weights of a sample of people we would
not be surprised to find a tendency for tall people to be heavier than short people
and vice versa. We will show in this section that two functions, the covariance
and the correlation, can be defined for a bivariate distribution and that these are
useful in characterising the relationship between the two random variables.
The covariance of two random variables X and Y is defined by
Cov[X, Y ] = E[(X − µX )(Y − µY )],
(30.133)
where µX and µY are the expectation values of X and Y respectively. Clearly
related to the covariance is the correlation of the two random variables, defined
by
Corr[X, Y ] =
Cov[X, Y ]
,
σX σY
(30.134)
where σX and σY are the standard deviations of X and Y respectively. It can be
shown that the correlation function lies between −1 and +1. If the value assumed
is negative, X and Y are said to be negatively correlated, if it is positive they are
said to be positively correlated and if it is zero they are said to be uncorrelated.
We will now justify the use of these terms.
1200
30.12 PROPERTIES OF JOINT DISTRIBUTIONS
One particularly useful consequence of its definition is that the covariance
of two independent variables, X and Y , is zero. It immediately follows from
(30.134) that their correlation is also zero, and this justifies the use of the term
‘uncorrelated’ for two such variables. To show this extremely important property
we first note that
Cov[X, Y ] = E[(X − µX )(Y − µY )]
= E[XY − µX Y − µY X + µX µY ]
= E[XY ] − µX E[Y ] − µY E[X] + µX µY
= E[XY ] − µX µY .
(30.135)
Now, if X and Y are independent then E[XY ] = E[X]E[Y ] = µX µY and so
Cov[X, Y ] = 0. It is important to note that the converse of this result is not
necessarily true; two variables dependent on each other can still be uncorrelated.
In other words, it is possible (and not uncommon) for two variables X and Y
to be described by a joint distribution f(x, y) that cannot be factorised into a
product of the form g(x)h(y), but for which Corr[X, Y ] = 0. Indeed, from the
definition (30.133), we see that for any joint distribution f(x, y) that is symmetric
in x about µX (or similarly in y) we have Corr[X, Y ] = 0.
We have already asserted that if the correlation of two random variables is
positive (negative) they are said to be positively (negatively) correlated. We have
also stated that the correlation lies between −1 and +1. The terminology suggests
that if the two RVs are identical (i.e. X = Y ) then they are completely correlated
and that their correlation should be +1. Likewise, if X = −Y then the functions
are completely anticorrelated and their correlation should be −1. Values of the
correlation function between these extremes show the existence of some degree
of correlation. In fact it is not necessary that X = Y for Corr[X, Y ] = 1; it is
sufficient that Y is a linear function of X, i.e. Y = aX + b (with a positive). If a
is negative then Corr[X, Y ] = −1. To show this we first note that µY = aµX + b.
Now
Y = aX + b = aX + µY − aµX
⇒
Y − µY = a(X − µX ),
and so using the definition of the covariance (30.133)
Cov[X, Y ] = aE[(X − µX )2 ] = aσX2 .
It follows from the properties of the variance (subsection 30.5.3) that σY = |a|σX
and so, using the definition (30.134) of the correlation,
Corr[X, Y ] =
aσX2
a
=
,
|a|
|a|σX2
which is the stated result.
It should be noted that, even if the possibilities of X and Y being non-zero are
mutually exclusive, Corr[X, Y ] need not have value ±1.
1201
PROBABILITY
A biased die gives probabilities 12 p, p, p, p, p, 2p of throwing 1, 2, 3, 4, 5, 6 respectively.
If the random variable X is the number shown on the die and the random variable Y is
defined as X 2 , calculate the covariance and correlation of X and Y .
We have already calculated in subsections 30.2.1 and 30.5.4 that
p=
2
,
13
E[X] =
53
,
13
253
E X2 =
,
13
V [X] =
480
.
169
Using (30.135), we obtain
Cov[X, Y ] = Cov[X, X 2 ] = E[X 3 ] − E[X]E[X 2].
Now E[X 3 ] is given by
E[X 3 ] = 13 × 12 p + (23 + 33 + 43 + 53 )p + 63 × 2p
1313
=
p = 101,
2
and the covariance of X and Y is given by
Cov[X, Y ] = 101 −
3660
53 253
×
=
.
13
13
169
The correlation is defined by Corr[X, Y ] = Cov[X, Y ]/σX σY . The standard deviation of
Y may be calculated from the definition of the variance. Letting µY = E[X 2 ] = 253
gives
13
2
2
2
2
p 2
1 − µ Y + p 22 − µ Y + p 32 − µ Y + p 42 − µ Y
2
2
2
+ p 52 − µY + 2p 62 − µY
187 356
28 824
=
p=
.
169
169
σY2 =
We deduce that
Corr[X, Y ] =
3660
169
169
28 824
169
≈ 0.984.
480
Thus the random variables X and Y display a strong degree of positive correlation, as we
would expect. We note that the covariance of X and Y occurs in various expressions. For
example, if X and Y are not independent then
V [X + Y ] = E (X + Y )2 − (E[X + Y ])2
= E X 2 + 2E[XY ] + E Y 2 − {(E[X])2 + 2E[X]E[Y ] + (E[Y ])2 }
= V [X] + V [Y ] + 2(E[XY ] − E[X]E[Y ])
= V [X] + V [Y ] + 2 Cov[X, Y ].
1202
30.12 PROPERTIES OF JOINT DISTRIBUTIONS
More generally, we find (for a, b and c constant)
V [aX + bY + c] = a2 V [X] + b2 V [Y ] + 2ab Cov[X, Y ].
(30.136)
Note that if X and Y are in fact independent then Cov[X, Y ] = 0 and we recover
the expression (30.68) in subsection 30.6.4.
We may use (30.136) to obtain an approximate expression for V [ f(X, Y )]
for any arbitrary function f, even when the random variables X and Y are
correlated. Approximating f(X, Y ) by the linear terms of its Taylor expansion
about the point (µX , µY ), we have
f(X, Y ) ≈ f(µX , µY ) +
∂f
∂X
(X − µX ) +
∂f
∂Y
(Y − µY ),
(30.137)
where the partial derivatives are evaluated at X = µX and Y = µY . Taking the
variance of both sides, and using (30.136), we find
V [ f(X, Y )] ≈
∂f
∂X
2
V [X] +
∂f
∂Y
2
V [Y ] + 2
∂f
∂X
∂f
∂Y
Cov[X, Y ].
(30.138)
Clearly, if Cov[X, Y ] = 0, we recover the result (30.69) derived in subsection 30.6.4.
We note that (30.138) is exact if f(X, Y ) is linear in X and Y .
For several variables Xi , i = 1, 2, . . . , n, we can define the symmetric (positive
definite) covariance matrix whose elements are
Vij = Cov[Xi , Xj ],
(30.139)
and the symmetric (positive definite) correlation matrix
ρij = Corr[Xi , Xj ].
The diagonal elements of the covariance matrix are the variances of the variables,
whilst those of the correlation matrix are unity. For several variables, (30.138)
generalises to
∂f 2
∂f ∂f V [Xi ] +
Cov[Xi , Xj ],
V [f(X1 , X2 , . . . , Xn )] ≈
∂Xi
∂Xi
∂Xj
i
i
j=i
where the partial derivatives are evaluated at Xi = µXi .
1203
PROBABILITY
A card is drawn at random from a normal 52-card pack and its identity noted. The card
is replaced, the pack shuffled and the process repeated. Random variables W , X, Y , Z are
defined as follows:
W =2
X=4
Y =1
Z =2
if the drawn card is a heart; W = 0 otherwise.
if the drawn card is an ace, king, or queen; X = 2 if the card is
a jack or ten; X = 0 otherwise.
if the drawn card is red; Y = 0 otherwise.
if the drawn card is black and an ace, king or queen; Z = 0
otherwise.
Establish the correlation matrix for W , X, Y , Z.
The means of the variables are given by
µW = 2 ×
µY = 1 ×
1
4
1
2
= 12 ,
=
µX = 4 ×
µZ = 2 ×
1
,
2
3
+ 2
13
6
3
=
.
52
13
×
2
13
16
,
13
=
The variances, calculated from
= V [U] = E U 2 − (E[U])2 , where U = W , X, Y or
Z, are
2
16 2
3
2
2
= 4 × 14 − 12 = 34 , σX2 = 16 × 13
,
+ 4 × 13
− 13 = 472
σW
169
2
2
6
3
69
= 169
.
σY2 = 1 × 12 − 12 = 14 , σZ2 = 4 × 52
− 13
σU2
The covariances are found by first calculating E[W X] etc. and then forming E[W X]−µW µX
etc.
3
2
8
8
E[W X] = 2 (4) 52
, Cov[W , X] = 13
− 12 16
+ 2 (2) 52
= 13
= 0,
13
E[W Y ] = 2(1)
1
4
= 12 ,
Cov[W , Y ] =
E[XZ] = 4(2)
−
Cov[W , Z] = 0 −
E[W Z] = 0,
E[XY ] = 4(1)
1
2
6
52
6
52
+ 2(1)
=
4
52
=
8
,
13
12
,
13
Cov[X, Y ] =
Cov[X, Z] =
8
13
12
13
1
2
−
−
Cov[Y , Z] = 0 −
E[Y Z] = 0,
1
2
1
2
1
2
= 14 ,
3
3
,
= − 26
13
16
13
1
2
3
16
13
13
3
13
= 0,
=
108
,
169
3
.
= − 26
The correlations Corr[W , X] and Corr[X, Y ] are clearly zero; the remainder are given by
−1/2
= 0.577,
Corr[W , Y ] = 14 34 × 14
−1/2
3
3
69
× 169
= −0.209,
Corr[W , Z] = − 26
4
−1/2
472
69
× 169
= 0.598,
Corr[X, Z] = 108
169 169
3
1
69 −1/2
= −0.361.
Corr[Y , Z] = − 26 4 × 169
Finally, then, we can write down the correlation matrix:


1
0
0.58
−0.21
0
1
0
0.60 

.
ρ=
0.58
0
1
−0.36 
−0.21 0.60 −0.36
1
1204
Fly UP