...

Properties of distributions

by taratuta

on
Category: Documents
104

views

Report

Comments

Transcript

Properties of distributions
30.5 PROPERTIES OF DISTRIBUTIONS
In many circumstances, however, random variables do not depend on one
another, i.e. they are independent. As an example, for a person drawn at random
from a population, we might expect height and IQ to be independent random
variables. Let us suppose that X and Y are two random variables with probability
density functions g(x) and h(y) respectively. In mathematical terms, X and Y are
independent RVs if their joint probability density function is given by f(x, y) =
g(x)h(y). Thus, for independent RVs, if X and Y are both discrete then
Pr(X = xi , Y = yj ) = g(xi )h(yj )
or, if X and Y are both continuous, then
Pr(x < X ≤ x + dx, y < Y ≤ y + dy) = g(x)h(y) dx dy.
The important point in each case is that the RHS is simply the product of the
individual probability density functions (compare with the expression for Pr(A∩B)
in (30.22) for statistically independent events A and B). By a simple extension,
one may also consider the case where one of the random variables is discrete and
the other continuous. The above discussion may also be trivially extended to any
number of independent RVs Xi , i = 1, 2, . . . , N.
The independent random variables X and Y have the PDFs g(x) = e−x and h(y) = 2e−2y
respectively. Calculate the probability that X lies in the interval 1 < X ≤ 2 and Y lies in
the interval 0 < Y ≤ 1.
Since X and Y are independent RVs, the required probability is given by
1
2
g(x) dx
h(y) dy
Pr(1 < X ≤ 2, 0 < Y ≤ 1) =
1
2
−x
e
=
0
1
dx
1
2e−2y dy
0
2 1
= −e−x 1 × −e−2y 0 = 0.23 × 0.86 = 0.20. 30.5 Properties of distributions
For a single random variable X, the probability density function f(x) contains
all possible information about how the variable is distributed. However, for the
purposes of comparison, it is conventional and useful to characterise f(x) by
certain of its properties. Most of these standard properties are defined in terms
of averages or expectation values. In the most general case, the expectation value
E[g(X)] of any function g(X) of the random variable X is defined as
#
g(xi )f(xi ) for a discrete distribution,
(30.45)
E[ g(X)] = i
g(x)f(x) dx for a continuous distribution,
where the sum or integral is over all allowed values of X. It is assumed that
1143
PROBABILITY
the series is absolutely convergent or that the integral exists, as the case may be.
From its definition it is straightforward to show that the expectation value has
the following properties:
(i) if a is a constant then E[a] = a;
(ii) if a is a constant then E[ag(X)] = aE[g(X)];
(iii) if g(X) = s(X) + t(X) then E[ g(X)] = E[ s(X)] + E[t(X)].
It should be noted that the expectation value is not a function of X but is
instead a number that depends on the form of the probability density function
f(x) and the function g(x). Most of the standard quantities used to characterise
f(x) are simply the expectation values of various functions of the random variable
X. We now consider these standard quantities.
30.5.1 Mean
The property most commonly used to characterise a probability distribution is
its mean, which is defined simply as the expectation value E[X] of the variable X
itself. Thus, the mean is given by
#
xi f(xi ) for a discrete distribution,
(30.46)
E[X] = i
xf(x) dx for a continuous distribution.
The alternative notations µ and x are also commonly used to denote the mean.
If in (30.46) the series is not absolutely convergent, or the integral does not exist,
we say that the distribution does not have a mean, but this is very rare in physical
applications.
The probability of finding a 1s electron in a hydrogen atom in a given infinitesimal volume
dV is ψ ∗ ψ dV , where the quantum mechanical wavefunction ψ is given by
ψ = Ae−r/a0 .
Find the value of the real constant A and thereby deduce the mean distance of the electron
from the origin.
Let us consider the random variable R = ‘distance of the electron from the origin’. Since
the 1s orbital has no θ- or φ-dependence (it is spherically symmetric), we may consider
the infinitesimal volume element dV as the spherical shell with inner radius r and outer
radius r + dr. Thus, dV = 4πr 2 dr and the PDF of R is simply
Pr(r < R ≤ r + dr) ≡ f(r) dr = 4πr 2 A2 e−2r/a0 dr.
The value of A is found by requiring the total probability (i.e. the probability that the
electron is somewhere) to be unity. Since R must lie between zero and infinity, we require
that
∞
A2
e−2r/a0 4πr2 dr = 1.
0
1144
30.5 PROPERTIES OF DISTRIBUTIONS
Integrating by parts we find A = 1/(πa30 )1/2 . Now, using the definition of the mean (30.46),
we find
∞
∞
4
E[R] =
rf(r) dr = 3
r3 e−2r/a0 dr.
a0 0
0
The integral on the RHS may be integrated by parts and takes the value 3a40 /8; consequently we find that E[R] = 3a0 /2. 30.5.2 Mode and median
Although the mean discussed in the last section is the most common measure
of the ‘average’ of a distribution, two other measures, which do not rely on the
concept of expectation values, are frequently encountered.
The mode of a distribution is the value of the random variable X at which the
probability (density) function f(x) has its greatest value. If there is more than one
value of X for which this is true then each value may equally be called the mode
of the distribution.
The median M of a distribution is the value of the random variable X at which
the cumulative probability function F(x) takes the value 12 , i.e. F(M) = 12 . Related
to the median are the lower and upper quartiles Ql and Qu of the PDF, which
are defined such that
F(Ql ) = 14 ,
F(Qu ) = 34 .
Thus the median and lower and upper quartiles divide the PDF into four regions
each containing one quarter of the probability. Smaller subdivisions are also
possible, e.g. the nth percentile, Pn , of a PDF is defined by F(Pn ) = n/100.
Find the mode of the PDF for the distance from the origin of the electron whose wavefunction was given in the previous example.
We found in the previous example that the PDF for the electron’s distance from the origin
was given by
f(r) =
4r2 −2r/a0
e
.
a30
(30.47)
Differentiating f(r) with respect to r, we obtain
r
8r
df
e−2r/a0 .
= 3 1−
dr
a0
a0
Thus f(r) has turning points at r = 0 and r = a0 , where df/dr = 0. It is straightforward
to show that r = 0 is a minimum and r = a0 is a maximum. Moreover, it is also clear that
r = a0 is a global maximum (as opposed to just a local one). Thus the mode of f(r) occurs
at r = a0 . 1145
PROBABILITY
30.5.3 Variance and standard deviation
The variance of a distribution, V [X], also written σ 2 , is defined by
V [X] = E (X − µ)2 =
#
j (xj
− µ)2 f(xj )
(x − µ)2 f(x) dx
for a discrete distribution,
for a continuous distribution.
(30.48)
Here µ has been written for the expectation value E[X] of X. As in the case of
the mean, unless the series and the integral in (30.48) converge the distribution
does not have a variance. From the definition (30.48) we may easily derive the
following useful properties of V [X]. If a and b are constants then
(i) V [a] = 0,
(ii) V [aX + b] = a2 V [X].
The variance of a distribution is always positive; its positive square root is
known as the standard deviation of the distribution and is often denoted by σ.
Roughly speaking, σ measures the spread (about x = µ) of the values that X can
assume.
Find the standard deviation of the PDF for the distance from the origin of the electron
whose wavefunction was discussed in the previous two examples.
Inserting the expression (30.47) for the PDF f(r) into (30.48), the variance of the random
variable R is given by
∞
∞
4
4r2
(r − µ)2 3 e−2r/a0 dr = 3
(r4 − 2r3 µ + r 2 µ2 )e−2r/a0 dr,
V [R] =
a0
a0 0
0
where the mean µ = E[R] = 3a0 /2. Integrating each term in the integrand by parts we
obtain
3a20
.
4
√
Thus the standard deviation of the distribution is σ = 3a0 /2. V [R] = 3a20 − 3µa0 + µ2 =
We may also use the definition (30.48) to derive the Bienaymé–Chebyshev
inequality, which provides a useful upper limit on the probability that random
variable X takes values outside a given range centred on the mean. Let us consider
the case of a continuous random variable, for which
f(x) dx,
Pr(|X − µ| ≥ c) =
|x−µ|≥c
where the integral on the RHS extends over all values of x satisfying the inequality
1146
30.5 PROPERTIES OF DISTRIBUTIONS
|x − µ| ≥ c. From (30.48), we find that
(x − µ)2 f(x) dx ≥ c2
σ2 ≥
|x−µ|≥c
|x−µ|≥c
f(x) dx.
(30.49)
The first inequality holds because both (x − µ)2 and f(x) are non-negative for
all x, and the second inequality holds because (x − µ)2 ≥ c2 over the range of
integration. However, the RHS of (30.49) is simply equal to c2 Pr(|X − µ| ≥ c),
and thus we obtain the required inequality
Pr(|X − µ| ≥ c) ≤
σ2
.
c2
A similar derivation may be carried through for the case of a discrete random
variable. Thus, for any distribution f(x) that possesses a variance we have, for
example,
1
1
Pr(|X − µ| ≥ 2σ) ≤
and Pr(|X − µ| ≥ 3σ) ≤ .
4
9
30.5.4 Moments
The mean (or expectation) of X is sometimes called the first moment of X, since
it is defined as the sum or integral of the probability density function multiplied
by the first power of x. By a simple extension the kth moment of a distribution
is defined by
#
xk f(xj ) for a discrete distribution,
k
µk ≡ E[X ] = j k j
x f(x) dx for a continuous distribution. (30.50)
For notational convenience, we have introduced the symbol µk to denote E[X k ],
the kth moment of the distribution. Clearly, the mean of the distribution is then
denoted by µ1 , often abbreviated simply to µ, as in the previous subsection, as
this rarely causes confusion.
A useful result that relates the second moment, the mean and the variance of
a distribution is proved using the properties of the expectation operator:
V [X] = E (X − µ)2
2
= E X − 2µX + µ2
= E X 2 − 2µE[X] + µ2
2
= E X − 2µ2 + µ2
= E X 2 − µ2 .
(30.51)
In alternative notations, this result can be written
(x − µ)2 = x2 − x2
1147
or
σ 2 = µ2 − µ21 .
PROBABILITY
A biased die has probabilities p/2, p, p, p, p, 2p of showing 1, 2, 3, 4, 5, 6 respectively. Find
(i) the mean, (ii) the second moment and (iii) the variance of this probability distribution.
By demanding that the sum of the probabilities equals unity we require p = 2/13. Now,
using the definition of the mean (30.46) for a discrete distribution,
E[X] =
xj f(xj ) = 1 × 12 p + 2 × p + 3 × p + 4 × p + 5 × p + 6 × 2p
j
53
53
2
53
p=
×
=
.
2
2
13
13
Similarly, using the definition of the second moment (30.50),
x2j f(xj ) = 12 × 12 p + 22 p + 32 p + 42 p + 52 p + 62 × 2p
E[X 2 ] =
=
j
253
253
p=
.
2
13
Finally, using the definition of the variance (30.48), with µ = 53/13, we obtain
(xj − µ)2 f(xj )
V [X] =
=
j
= (1 − µ)2 12 p + (2 − µ)2 p + (3 − µ)2 p + (4 − µ)2 p + (5 − µ)2 p + (6 − µ)2 2p
480
3120
p=
=
.
169
169
It is easy to verify that V [X] = E X 2 − (E[X])2 . In practice, to calculate the moments of a distribution it is often simpler to use
the moment generating function discussed in subsection 30.7.2. This is particularly
true for higher-order moments, where direct evaluation of the sum or integral in
(30.50) can be somewhat laborious.
30.5.5 Central moments
The variance V [X] is sometimes called the second central moment of the distribution, since it is defined as the sum or integral of the probability density function
multiplied by the second power of x − µ. The origin of the term ‘central’ is that by
subtracting µ from x before squaring we are considering the moment about the
mean of the distribution, rather than about x = 0. Thus the kth central moment
of a distribution is defined as
#
(xj − µ)k f(xj ) for a discrete distribution,
(30.52)
νk ≡ E (X − µ)k = j
(x − µ)k f(x) dx for a continuous distribution.
It is convenient to introduce the notation νk for the kth central moment. Thus
V [X] ≡ ν2 and we may write (30.51) as ν2 = µ2 − µ21 . Clearly, the first central
moment of a distribution is always zero since, for example in the continuous case,
ν1 = (x − µ)f(x) dx = xf(x) dx − µ f(x) dx = µ − (µ × 1) = 0.
1148
30.5 PROPERTIES OF DISTRIBUTIONS
We note that the notation µk and νk for the moments and central moments
respectively is not universal. Indeed, in some books their meanings are reversed.
We can write the kth central moment of a distribution in terms of its kth and
lower-order moments by expanding (X − µ)k in powers of X. We have already
noted that ν2 = µ2 − µ21 , and similar expressions may be obtained for higher-order
central moments. For example,
ν3 = E (X − µ1 )3
3
= E X − 3µ1 X 2 + 3µ21 X − µ31
= µ3 − 3µ1 µ2 + 3µ21 µ1 − µ31
= µ3 − 3µ1 µ2 + 2µ31 .
(30.53)
In general, it is straightforward to show that
νk = µk − k C1 µk−1 µ1 + · · · + (−1)r k Cr µk−r µr1 + · · · + (−1)k−1 (k Ck−1 − 1)µk1 .
(30.54)
Once again, direct evaluation of the sum or integral in (30.52) can be rather
tedious for higher moments, and it is usually quicker to use the moment generating
function (see subsection 30.7.2), from which the central moments can be easily
evaluated as well.
The PDF for a Gaussian distribution (see subsection 30.9.1) with mean µ and variance
σ 2 is given by
(x − µ)2
1
.
f(x) = √ exp −
2σ 2
σ 2π
Obtain an expression for the kth central moment of this distribution.
As an illustration, we will perform this calculation by evaluating the integral in (30.52)
directly. Thus, the kth central moment of f(x) is given by
∞
νk =
(x − µ)k f(x) dx
−∞
∞
(x − µ)2
1
dx
(x − µ)k exp −
= √
2σ 2
σ 2π −∞
∞
1
y2
= √
(30.55)
y k exp − 2 dy,
2σ
σ 2π −∞
where in the last line we have made the substitution y = x − µ. It is clear that if k is
odd then the integrand is an odd function of y and hence the integral equals zero. Thus,
νk = 0 if k is odd. When k is even, we could calculate νk by integrating by parts to obtain
a reduction formula, but it is more elegant to consider instead the standard integral (see
subsection 6.4.2)
∞
I=
exp(−αy 2 ) dy = π 1/2 α−1/2 ,
−∞
1149
Fly UP