Generating functions

by taratuta

Category: Documents





Generating functions
variance of both sides of (30.66), and using (30.68), we find
V [ Z(X, Y )] ≈
V [X] +
V [Y ],
the partial derivatives being evaluated at X = µX and Y = µY .
30.7 Generating functions
As we saw in chapter 16, when dealing with particular sets of functions fn ,
each member of the set being characterised by a different non-negative integer
n, it is sometimes possible to summarise the whole set by a single function of a
dummy variable (say t), called a generating function. The relationship between
the generating function and the nth member fn of the set is that if the generating
function is expanded as a power series in t then fn is the coefficient of tn . For
example, in the expansion of the generating function G(z, t) = (1 − 2zt + t2 )−1/2 ,
the coefficient of tn is the nth Legendre polynomial Pn (z), i.e.
G(z, t) = (1 − 2zt + t2 )−1/2 =
Pn (z)tn .
We found that many useful properties of, and relationships between, the members
of a set of functions could be established using the generating function and other
functions obtained from it, e.g. its derivatives.
Similar ideas can be used in the area of probability theory, and two types of
generating function can be usefully defined, one more generally applicable than
the other. The more restricted of the two, applicable only to discrete integral
distributions, is called a probability generating function; this is discussed in the
next section. The second type, a moment generating function, can be used with
both discrete and continuous distributions and is considered in subsection 30.7.2.
From the moment generating function, we may also construct the closely related characteristic and cumulant generating functions; these are discussed in
subsections 30.7.3 and 30.7.4 respectively.
30.7.1 Probability generating functions
As already indicated, probability generating functions are restricted in applicability to integer distributions, of which the most common (the binomial, the Poisson
and the geometric) are considered in this and later subsections. In such distributions a random variable may take only non-negative integer values. The actual
possible values may be finite or infinite in number, but, for formal purposes,
all integers, 0, 1, 2, . . . are considered possible. If only a finite number of integer
values can occur in any particular case then those that cannot occur are included
but are assigned zero probability.
If, as previously, the probability that the random variable X takes the value xn
is f(xn ), then
f(xn ) = 1.
In the present case, however, only non-negative integer values of xn are possible,
and we can, without ambiguity, write the probability that X takes the value n as
fn , with
fn = 1.
We may now define the probability generating function ΦX (t) by
ΦX (t) ≡
fn tn .
It is immediately apparent that ΦX (t) = E[tX ] and that, by virtue of (30.70),
ΦX (1) = 1.
Probably the simplest example of a probability generating function (PGF) is
provided by the random variable X defined by
1 if the outcome of a single trial is a ‘success’,
0 if the trial ends in ‘failure’.
If the probability of success is p and that of failure q (= 1 − p) then
ΦX (t) = qt0 + pt1 + 0 + 0 + · · · = q + pt.
This type of random variable is discussed much more fully in subsection 30.8.1.
In a similar but slightly more complicated way, a Poisson-distributed integer
variable with mean λ (see subsection 30.8.4) has a PGF
ΦX (t) =
e−λ λn
tn = e−λ eλt .
We note that, as required, ΦX (1) = 1 in both cases.
Useful results will be obtained from this kind of approach only if the summation
(30.71) can be carried out explicitly in particular cases and the functions derived
from ΦX (t) can be shown to be related to meaningful parameters. Two such
relationships can be obtained by differentiating (30.71) with respect to t. Taking
the first derivative we find
dΦX (t) nfn tn−1 ⇒ ΦX (1) =
nfn = E[X],
and differentiating once more we obtain
d2 ΦX (t) =
n(n − 1)fn tn−2 ⇒ ΦX (1) =
n(n − 1)fn = E[X(X − 1)].
Equation (30.74) shows that ΦX (1) gives the mean of X. Using both (30.75) and
(30.51) allows us to write
ΦX (1) + ΦX (1) − ΦX (1) = E[X(X − 1)] + E[X] − (E[X])2
= E X 2 − E[X] + E[X] − (E[X])2
= E X 2 − (E[X])2
= V [X],
and so express the variance of X in terms of the derivatives of its probability
generating function.
A random variable X is given by the number of trials needed to obtain a first success
when the chance of success at each trial is constant and equal to p. Find the probability
generating function for X and use it to determine the mean and variance of X.
Clearly, at least one trial is needed, and so f0 = 0. If n (≥ 1) trials are needed for the first
success, the first n − 1 trials must have resulted in failure. Thus
Pr(X = n) = q n−1 p,
n ≥ 1,
where q = 1 − p is the probability of failure in each individual trial.
The corresponding probability generating function is thus
fn tn =
(q n−1 p)tn
ΦX (t) =
(qt)n = ×
q n=1
1 − qt
1 − qt
where we have used the result for the sum of a geometric series, given in chapter 4, to
obtain a closed-form expression for ΦX (t). Again, as must be the case, ΦX (1) = 1.
To find the mean and variance of X we need to evaluate ΦX (1) and ΦX (1). Differentiating
(30.78) gives
ΦX (t) =
⇒ ΦX (1) = 2 = ,
(1 − qt)2
⇒ ΦX (1) = 3 = 2 .
ΦX (t) =
(1 − qt)3
Thus, using (30.74) and (30.76),
E[X] = ΦX (1) = ,
V [X] = ΦX (1) + ΦX (1) − [ΦX (1)]2
= 2 + − 2 = 2.
p p
A distribution with probabilities of the general form (30.77) is known as a geometric
distribution and is discussed in subsection 30.8.2. This form of distribution is common in
‘waiting time’ problems (subsection 30.9.3). 1159
Figure 30.10 The pairs of values of n and r used in the evaluation of ΦX+Y (t).
Sums of random variables
We now turn to considering the sum of two or more independent random
variables, say X and Y , and denote by S2 the random variable
S2 = X + Y .
If ΦS2 (t) is the PGF for S2 , the coefficient of tn in its expansion is given by the
probability that X + Y = n and is thus equal to the sum of the probabilities that
X = r and Y = n − r for all values of r in 0 ≤ r ≤ n. Since such outcomes for
different values of r are mutually exclusive, we have
Pr(X + Y = n) =
Pr(X = r) Pr(Y = n − r).
Multiplying both sides of (30.79) by tn and summing over all values of n enables
us to express this relationship in terms of probability generating functions as
ΦX+Y (t) =
Pr(X + Y = n)tn =
∞ n
Pr(X = r)tr Pr(Y = n − r)tn−r
n=0 r=0
∞ ∞
Pr(X = r)tr Pr(Y = n − r)tn−r .
r=0 n=r
The change in summation order is justified by reference to figure 30.10, which
illustrates that the summations are over exactly the same pairs of values of n and
r, but with the first (inner) summation over the points in a column rather than
over the points in a row. Now, setting n = r + s gives the final result,
ΦX+Y (t) =
Pr(X = r)tr
Pr(Y = s)ts
= ΦX (t)ΦY (t),
i.e. the PGF of the sum of two independent random variables is equal to the
product of their individual PGFs. The same result can be deduced in a less formal
way by noting that if X and Y are independent then
E tX+Y = E tX E tY .
Clearly result (30.80) can be extended to more than two random variables by
writing S3 = S2 + Z etc., to give
Φ( ni=1 Xi ) (t) =
ΦXi (t),
and, further, if all the Xi have the same probability distribution,
Φ( ni=1 Xi ) (t) = [ΦX (t)]n .
This latter result has immediate application in the deduction of the PGF for the
binomial distribution from that for a single trial, equation (30.72).
Variable-length sums of random variables
As a final result in the theory of probability generating functions we show how to
calculate the PGF for a sum of N random variables, all with the same probability
distribution, when the value of N is itself a random variable but one with a
known probability distribution. In symbols, we wish to find the distribution
SN = X1 + X2 + · · · + XN ,
where N is a random variable with Pr(N = n) = hn and PGF χN (t) =
hn t .
The probability ξk that SN = k is given by a sum of conditional probabilities,
ξk =
Pr(N = n) Pr(X0 + X1 + X2 + · · · + Xn = k)
hn × coefficient of tk in [ΦX (t)]n .
Multiplying both sides of this equation by tk and summing over all k, we obtain
Formally X0 = 0 has to be included, since Pr(N = 0) may be non-zero.
an expression for the PGF ΞS (t) of SN :
ΞS (t) =
ξk tk =
hn × coefficient of tk in [ΦX (t)]n
tk × coefficient of tk in [ΦX (t)]n
hn [ΦX (t)]n
= χN (ΦX (t)).
In words, the PGF of the sum SN is given by the compound function χN (ΦX (t))
obtained by substituting ΦX (t) for t in the PGF for the number of terms N in
the sum. We illustrate this with the following example.
The probability distribution for the number of eggs in a clutch is Poisson distributed with
mean λ, and the probability that each egg will hatch is p (and is independent of the size of
the clutch). Use the results stated in (30.72) and (30.73) to show that the PGF (and hence
the probability distribution) for the number of chicks that hatch corresponds to a Poisson
distribution having mean λp.
The number of chicks that hatch is given by a sum of the form (30.83) in which Xi = 1 if
the ith chick hatches and Xi = 0 if it does not. As given by (30.72), ΦX (t) is thus (1−p)+pt.
The value of N is given by a Poisson distribution with mean λ; thus, from (30.73), in the
terminology of our previous discussion,
χN (t) = e−λ eλt .
We now substitute these forms into (30.84) to obtain
ΞS (t) = exp(−λ) exp[λΦX (t)]
= exp(−λ) exp{λ[(1 − p) + pt]}
= exp(−λp) exp(λpt).
But this is exactly the PGF of a Poisson distribution with mean λp.
That this implies that the probability is Poisson distributed is intuitively obvious since,
in the expansion of the PGF as a power series in t, every coefficient will be precisely
that implied by such a distribution. A solution of the same problem by direct calculation
appears in the answer to exercise 30.29. 30.7.2 Moment generating functions
As we saw in section 30.5 a probability function is often expressed in terms of
its moments. This leads naturally to the second type of generating function, a
moment generating function. For a random variable X, and a real number t, the
moment generating function (MGF) is defined by
tX etxi f(xi ) for a discrete distribution,
= i tx
MX (t) = E e
e f(x) dx for a continuous distribution. (30.85)
The MGF will exist for all values of t provided that X is bounded and always
exists at the point t = 0 where M(0) = E(1) = 1.
It will be apparent that the PGF and the MGF for a random variable X
are closely related. The former is the expectation of tX whilst the latter is the
expectation of etX :
ΦX (t) = E tX ,
MX (t) = E etX .
The MGF can thus be obtained from the PGF by replacing t by et , and vice
versa. The MGF has more general applicability, however, since it can be used
with both continuous and discrete distributions whilst the PGF is restricted to
non-negative integer distributions.
As its name suggests, the MGF is particularly useful for obtaining the moments
of a distribution, as is easily seen by noting that
t2 X 2
+ ···
E etX = E 1 + tX +
= 1 + E[X]t + E X 2
+ ··· .
Assuming that the MGF exists for all t around the point t = 0, we can deduce
that the moments of a distribution are given in terms of its MGF by
dn MX (t) E[X n ] =
dtn t=0
Similarly, by substitution in (30.51), the variance of the distribution is given by
V [X] = MX (0) − MX (0) ,
where the prime denotes differentiation with respect to t.
The MGF for the Gaussian distribution (see the end of subsection 30.9.1) is given by
MX (t) = exp µt + 12 σ 2 t2 .
Find the expectation and variance of this distribution.
Using (30.86),
MX (t) = µ + σ 2 t exp µt + 12 σ 2 t2
MX (t) = σ 2 + (µ + σ 2 t)2 exp µt + 12 σ 2 t2
E[X] = MX (0) = µ,
MX (0) = σ 2 + µ2 .
Thus, using (30.87),
V [X] = σ 2 + µ2 − µ2 = σ 2 .
That the mean is found to be µ and the variance σ 2 justifies the use of these symbols in
the Gaussian distribution. The moment generating function has several useful properties that follow from
its definition and can be employed in simplifying calculations.
Scaling and shifting
If Y = aX + b, where a and b are arbitrary constants, then
MY (t) = E etY = E et(aX+b) = ebt E eatX = ebt MX (at).
This result is often useful for obtaining the central moments of a distribution. If the
MFG of X is MX (t) then the variable Y = X−µ has the MGF MY (t) = e−µt MX (t),
which clearly generates the central moments of X, i.e.
d −µt
E[(X − µ)n ] = E[Y n ] = MY(n) (0) =
Sums of random variables
If X1 , X2 , . . . , XN are independent random variables and SN = X1 + X2 + · · · + XN
t(X1 +X2 +···+XN ) tSN tXi
=E e
MSN (t) = E e
Since the Xi are independent,
MSN (t) =
E etXi =
MXi (t).
In words, the MGF of the sum of N independent random variables is the product
of their individual MGFs. By combining (30.89) with (30.88), we obtain the more
general result that the MGF of SN = c1 X1 + c2 X2 + · · · + cN XN (where the ci are
constants) is given by
MSN (t) =
MXi (ci t).
Variable-length sums of random variables
Let us consider the sum of N independent random variables Xi (i = 1, 2, . . . , N), all
with the same probability distribution, and let us suppose that N is itself a random
variable with a known distribution. Following the notation of section 30.7.1,
SN = X1 + X2 + · · · + XN ,
where N is a random variable with Pr(N = n) = hn and probability generating
hn t . For definiteness, let us assume that the Xi are continuous
function χN (t) =
RVs (an analogous discussion can be given in the discrete case). Thus, the
probability that value of SN lies in the interval s to s + ds is given by§
Pr(s < SN ≤ s + ds) =
Pr(N = n) Pr(s < X0 + X1 + X2 · · · + Xn ≤ s + ds).
Write Pr(s < SN ≤ s + ds) as fN (s) ds and Pr(s < X0 + X1 + X2 · · · + Xn ≤ s + ds)
as fn (s) ds. The kth moment of the PDF fN (s) is given by
µk = sk fN (s) ds = sk
Pr(N = n)fn (s) ds
Pr(N = n)
sk fn (s) ds
hn × (k!× coefficient of tk in [MX (t)]n )
Thus the MGF of SN is given by
MSN (t) =
tk =
tk × coefficient of tk in [MX (t)]n
hn [MX (t)]n
= χN (MX (t)).
In words, the MGF of the sum SN is given by the compound function χN (MX (t))
obtained by substituting MX (t) for t in the PGF for the number of terms N in
the sum.
If the MGF of the random variable X1 is identical to that for X2 then the
probability distributions of X1 and X2 are identical. This is intuitively reasonable
although a rigorous proof is complicated,¶ and beyond the scope of this book.
30.7.3 Characteristic function
The characteristic function (CF) of a random variable X is defined as
itX eitxj f(xj ) for a discrete distribution,
= jitx
CX (t) = E e
e f(x) dx for a continuous distribution
As in the previous section, X0 has to be formally included, since Pr(N = 0) may be non-zero.
See, for example, P. A. Moran, An Introduction to Probability Theory (New York: Oxford Science
Publications, 1984).
so that CX (t) = MX (it), where MX (t) is the MGF of X. Clearly, the characteristic
function and the MGF are very closely related and can be used interchangeably.
Because of the formal similarity between the definitions of CX (t) and MX (t), the
characteristic function possesses analogous properties to those listed in the previous section for the MGF, with only minor modifications. Indeed, by substituting it
for t in any of the relations obeyed by the MGF and noting that CX (t) = MX (it),
we obtain the corresponding relationship for the characteristic function. Thus, for
example, the moments of X are given in terms of the derivatives of CX (t) by
E[X n ] = (−i)n CX(n) (0).
Similarly, if Y = aX + b then CY (t) = eibt CX (at).
Whether to describe a random variable by its characteristic function or by its
MGF is partly a matter of personal preference. However, the use of the CF does
have some advantages. Most importantly, the replacement of the exponential etX
in the definition of the MGF by the complex oscillatory function eitX in the CF
means that in the latter we avoid any difficulties associated with convergence of
the relevant sum or integral. Furthermore, when X is a continous RV, we see
from (30.91) that CX (t) is related to the Fourier transform of the PDF f(x). As
a consequence of Fourier’s inversion theorem, we may obtain f(x) from CX (t) by
performing the inverse transform
CX (t)e−itx dt.
f(x) =
2π −∞
30.7.4 Cumulant generating function
As mentioned at the end of subsection 30.5.5, we may also describe a probability
density function f(x) in terms of its cumulants. These quantities may be expressed
in terms of the moments of the distribution and are important in sampling theory,
which we discuss in the next chapter. The cumulants of a distribution are best
defined in terms of its cumulant generating function (CGF), given by KX (t) =
ln MX (t) where MX (t) is the MGF of the distribution. If KX (t) is expanded as a
power series in t then the kth cumulant κk of f(x) is the coefficient of tk /k!:
KX (t) = ln MX (t) ≡ κ1 t + κ2
+ κ3 + · · · .
Since MX (0) = 1, KX (t) contains no constant term.
Find all the cumulants of the Gaussian distribution discussed in the previous example.
The moment generating function for the Gaussian distribution is MX (t) = exp µt + 12 σ 2 t2 .
Thus, the cumulant generating function has the simple form
KX (t) = ln MX (t) = µt + 12 σ 2 t2 .
Comparing this expression with (30.92), we find that κ1 = µ, κ2 = σ 2 and all other
cumulants are equal to zero. We may obtain expressions for the cumulants of a distribution in terms of its
moments by differentiating (30.92) with respect to t to give
1 dMX
MX dt
Expanding each term as power series in t and cross-multiplying, we obtain
1 + µ1 t + µ2 + · · · = µ1 + µ2 t + µ3 + · · · ,
κ1 + κ2 t + κ3 + · · ·
and, on equating coefficients of like powers of t on each side, we find
µ1 = κ1 ,
µ2 = κ2 + κ1 µ1 ,
µ3 = κ3 + 2κ2 µ1 + κ1 µ2 ,
µ4 = κ4 + 3κ3 µ1 + 3κ2 µ2 + κ1 µ3 ,
µk = κk + k−1 C1 κk−1 µ1 + · · · + k−1 Cr κk−r µr + · · · + κ1 µk−1 .
Solving these equations for the κk , we obtain (for the first four cumulants)
κ1 = µ1 ,
κ2 = µ2 − µ21 = ν2 ,
κ3 = µ3 − 3µ2 µ1 + 2µ31 = ν3 ,
κ4 = µ4 − 4µ3 µ1 + 12µ2 µ21 − 3µ22 − 6µ41 = ν4 − 3ν22 .
Higher-order cumulants may be calculated in the same way but become increasingly lengthy to write out in full.
The principal property of cumulants is their additivity, which may be proved
by combining (30.92) with (30.90). If X1 , X2 , . . . , XN are independent random
variables and KXi (t) for i = 1, 2, . . . , N is the CGF for Xi then the CGF of
SN = c1 X1 + c2 X2 + · · · + cN XN (where the ci are constants) is given by
KSN (t) =
KXi (ci t).
Cumulants also have the useful property that, under a change of origin X →
X + a the first cumulant undergoes the change κ1 → κ1 + a but all higher-order
cumulants remain unchanged. Under a change of scale X → bX, cumulant κr
undergoes the change κr → br κr .
Fly UP