Important joint distributions

by taratuta

on 20 января 2017

Category: Documents

>> Downloads: 13

views

Report

Comments

Description

Download Important joint distributions

Transcript

Important joint distributions

30.15 IMPORTANT JOINT DISTRIBUTIONS
where
∂x
1
∂y1
∂(x1 , x2 . . . , xn )
J≡
= ..
∂(y1 , y2 , . . . , yn ) .
∂x1
∂yn
...
..
.
...
∂xn
∂y1
..
.
∂xn
∂yn
,
is the Jacobian of the xi with respect to the yj .
Suppose that the random variables Xi , i = 1, 2, . . . , n, are independent and Gaussian distributed with means µi and variances σi2 respectively. Find the PDF for the new variables
spherical shell in Z-space,
Zi = (Xi − µi )/σi , i = 1, 2, . . . , n. By considering an elemental
ﬁnd the PDF of the chi-squared random variable χ2n = ni=1 Zi2 .
Since the Xi are independent random variables,
f(x1 , x2 , . . . , xn ) = f(x1 )f(x2 ) · · · f(xn ) =
n
1
(xi − µi )2
exp
−
.
(2π)n/2 σ1 σ2 · · · σn
2σi2
i=1
To derive the PDF for the variables Zi , we require
|f(x1 , x2 , . . . , xn ) dx1 dx2 · · · dxn | = |g(z1 , z2 , . . . , zn ) dz1 dz2 · · · dzn |,
and, noting that dzi = dxi /σi , we obtain
n
1
1 2
g(z1 , z2 , . . . , zn ) =
exp −
z .
(2π)n/2
2 i=1 i
Let us now consider the random variable χ2n = ni=1 Zi2 , which we may regard as the
square of the distance from the origin in the n-dimensional Z-space. We now require that
g(z1 , z2 , . . . , zn ) dz1 dz2 · · · dzn = h(χ2n )dχ2n .
If we consider the inﬁnitesimal volume dV = dz1 dz2 · · · dzn to be that enclosed by the
n-dimensional spherical shell of radius χn and thickness dχn then we may write dV =
Aχn−1
n dχn , for some constant A. We thus obtain
2
1 2 n−2
h(χ2n )dχ2n ∝ exp(− 21 χ2n )χn−1
n dχn ∝ exp(− 2 χn )χn dχn ,
where we have used the fact that dχ2n = 2χn dχn . Thus we see that the PDF for χ2n is given
by
h(χ2n ) = B exp(− 12 χ2n )χn−2
n ,
for some constant B. This constant may be determined from the normalisation condition
∞
h(χ2n ) dχ2n = 1
0
and is found to be B = [2n/2 Γ( 21 n)]−1 . This is the nth-order chi-squared distribution
discussed in subsection 30.9.4. 30.15 Important joint distributions
In this section we will examine two important multivariate distributions, the
multinomial distribution, which is an extension of the binomial distribution, and
the multivariate Gaussian distribution.
1207
PROBABILITY
30.15.1 The multinomial distribution
The binomial distribution describes the probability of obtaining x ‘successes’ from
n independent trials, where each trial has only two possible outcomes. This may
be generalised to the case where each trial has k possible outcomes with respective
probabilities p1 , p2 , . . . , pk . If we consider the random variables Xi , i = 1, 2, . . . , n,
to be the number of outcomes of type i in n trials then we may calculate their
joint probability function
f(x1 , x2 , . . . , xk ) = Pr(X1 = x1 , X2 = x2 , . . . , Xk = xk ),
k
where we must have
i=1 xi = n. In n trials the probability of obtaining x1
outcomes of type 1, followed by x2 outcomes of type 2 etc. is given by
px1 1 px2 2 · · · pxk k .
However, the number of distinguishable permutations of this result is
n!
,
x1 !x2 ! · · · xk !
and thus
f(x1 , x2 , . . . , xk ) =
n!
px1 px2 · · · pxk k .
x1 !x2 ! · · · xk ! 1 2
(30.146)
This is the multinomial probability distribution.
If k = 2 then the multinomial distribution reduces to the familiar binomial
distribution. Although in this form the binomial distribution appears to be a
function of two random variables, it must be remembered that, in fact, since
p2 = 1 − p1 and x2 = n − x1 , the distribution of X1 is entirely determined by the
parameters p and n. That X1 has a binomial distribution is shown by remembering
that it represents the number of objects of a particular type obtained from
sampling with replacement, which led to the original deﬁnition of the binomial
distribution. In fact, any of the random variables Xi has a binomial distribution,
i.e. the marginal distribution of each Xi is binomial with parameters n and pi . It
immediately follows that
E[Xi ] = npi
and
V [Xi ]2 = npi (1 − pi ).
(30.147)
At a village f ête patrons were invited, for a 10 p entry fee, to pick without looking six
tickets from a drum containing equal large numbers of red, blue and green tickets. If ﬁve
or more of the tickets were of the same colour a prize of 100 p was awarded. A consolation
award of 40 p was made if two tickets of each colour were picked. Was a good time had by
all?
In this case, all types of outcome (red, blue and green) have the same probabilities. The
probability of obtaining any given combination of tickets is given by the multinomial
distribution with n = 6, k = 3 and pi = 13 , i = 1, 2, 3.
1208
30.15 IMPORTANT JOINT DISTRIBUTIONS
(i) The probability of picking six tickets of the same colour is given by
6 0 0
6!
1
1
1
1
Pr (six of the same colour) = 3 ×
=
.
6!0!0! 3
3
3
243
The factor of 3 is present because there are three diﬀerent colours.
(ii) The probability of picking ﬁve tickets of one colour and one ticket of another
colour is
5 1 0
4
1
1
1
6!
=
.
Pr(ﬁve of one colour; one of another) = 3 × 2 ×
5!1!0! 3
3
3
81
The factors of 3 and 2 are included because there are three ways to choose the
colour of the ﬁve matching tickets, and then two ways to choose the colour of the
remaining ticket.
(iii) Finally, the probability of picking two tickets of each colour is
2 2 2
6!
10
1
1
1
Pr (two of each colour) =
=
.
2!2!2! 3
3
3
81
Thus the expected return to any patron was, in pence,
10
4
1
+
+ 40 ×
= 10.29.
100
243 81
81
A good time was had by all but the stallholder! 30.15.2 The multivariate Gaussian distribution
A particularly interesting multivariate distribution is provided by the generalisation of the Gaussian distribution to multiple random variables Xi , i = 1, 2, . . . , n.
If the expectation value of Xi is E(Xi ) = µi then the general form of the PDF is
given by
1
aij (xi − µi )(xj − µj ) ,
f(x1 , x2 , . . . , xn ) = N exp − 2
i
j
where aij = aji and N is a normalisation constant that we give below. If we write
the column vectors x = (x1 x2 · · · xn )T and µ = (µ1 µ2 · · · µn )T , and
denote the matrix with elements aij by A then
f(x) = f(x1 , x2 , . . . , xn ) = N exp − 21 (x − µ)T A(x − µ) ,
where A is symmetric. Using the same method as that used to derive (30.145) it
is straightforward to show that the MGF of f(x) is given by
M(t1 , t2 , . . . , tn ) = exp µT t + 12 tT A−1 t ,
where the column matrix t = (t1
E[Xi Xj ] =
t2
···
tn )T . From the MGF, we ﬁnd that
∂2 M(0, 0, . . . , 0)
= µi µj + (A−1 )ij ,
∂ti ∂tj
1209
PROBABILITY
and thus, using (30.135), we obtain
Cov[Xi , Xj ] = E[(Xi − µi )(Xj − µj )] = (A−1 )ij .
Hence A is equal to the inverse of the covariance matrix V of the Xi , see (30.139).
Thus, with the correct normalisation, f(x) is given by
1
exp − 21 (x − µ)T V−1 (x − µ) .
f(x) =
n/2
1/2
(2π) (det V)
(30.148)
Evaluate the integral
I=
∞
exp − 12 (x − µ)T V−1 (x − µ) dn x,
where V is a symmetric matrix, and hence verify the normalisation in (30.148).
We begin by making the substitution y = x − µ to obtain
exp(− 21 yT V−1 y) dn y.
I=
∞
Since V is a symmetric matrix, it may be diagonalised by an orthogonal transformation to
the new set of variables y = ST y, where S is the orthogonal matrix with the normalised
eigenvectors of V as its columns (see section 8.16). In this new basis, the matrix V becomes
V = ST VS = diag(λ1 , λ2 , . . . , λn ),
where the λi are the eigenvalues of V. Also, since S is orthogonal, det S = ±1, and so
dn y = |det S| dn y = dn y .
Thus we can write I as
I=
n
yi 2
···
exp −
dy1 dy2 · · · dyn
2λi
−∞ −∞
−∞
i=1
n ∞
yi 2
=
exp −
(30.149)
dyi = (2π)n/2 (λ1 λ2 · · · λn )1/2 ,
2λi
i=1 −∞
∞
where we have used the standard integral −∞ exp(−αy 2 ) dy = (π/α)1/2 (see subsection
6.4.2). From section 8.16, however, we note that the product of eigenvalues in (30.149) is
equal to det V. Thus we ﬁnally obtain
∞
∞
∞
I = (2π)n/2 (det V)1/2 ,
and hence the normalisation in (30.148) ensures that f(x) integrates to unity. The above example illustrates some importants points concerning the multivariate Gaussian distribution. In particular, we note that the Yi are independent
Gaussian variables with mean zero and variance λi . Thus, given a general set of
n Gaussian variables x with means µ and covariance matrix V, one can always
perform the above transformation to obtain a new set of variables y , which are
linear combinations of the old ones and are distributed as independent Gaussians
with zero mean and variances λi .
This result is extremely useful in proving many of the properties of the mul1210