Comments
Description
Transcript
Generating functions
30.7 GENERATING FUNCTIONS variance of both sides of (30.66), and using (30.68), we find 2 2 ∂Z ∂Z V [ Z(X, Y )] ≈ V [X] + V [Y ], ∂X ∂Y (30.69) the partial derivatives being evaluated at X = µX and Y = µY . 30.7 Generating functions As we saw in chapter 16, when dealing with particular sets of functions fn , each member of the set being characterised by a different non-negative integer n, it is sometimes possible to summarise the whole set by a single function of a dummy variable (say t), called a generating function. The relationship between the generating function and the nth member fn of the set is that if the generating function is expanded as a power series in t then fn is the coefficient of tn . For example, in the expansion of the generating function G(z, t) = (1 − 2zt + t2 )−1/2 , the coefficient of tn is the nth Legendre polynomial Pn (z), i.e. G(z, t) = (1 − 2zt + t2 )−1/2 = ∞ Pn (z)tn . n=0 We found that many useful properties of, and relationships between, the members of a set of functions could be established using the generating function and other functions obtained from it, e.g. its derivatives. Similar ideas can be used in the area of probability theory, and two types of generating function can be usefully defined, one more generally applicable than the other. The more restricted of the two, applicable only to discrete integral distributions, is called a probability generating function; this is discussed in the next section. The second type, a moment generating function, can be used with both discrete and continuous distributions and is considered in subsection 30.7.2. From the moment generating function, we may also construct the closely related characteristic and cumulant generating functions; these are discussed in subsections 30.7.3 and 30.7.4 respectively. 30.7.1 Probability generating functions As already indicated, probability generating functions are restricted in applicability to integer distributions, of which the most common (the binomial, the Poisson and the geometric) are considered in this and later subsections. In such distributions a random variable may take only non-negative integer values. The actual possible values may be finite or infinite in number, but, for formal purposes, all integers, 0, 1, 2, . . . are considered possible. If only a finite number of integer values can occur in any particular case then those that cannot occur are included but are assigned zero probability. 1157 PROBABILITY If, as previously, the probability that the random variable X takes the value xn is f(xn ), then f(xn ) = 1. n In the present case, however, only non-negative integer values of xn are possible, and we can, without ambiguity, write the probability that X takes the value n as fn , with ∞ fn = 1. (30.70) n=0 We may now define the probability generating function ΦX (t) by ΦX (t) ≡ ∞ fn tn . (30.71) n=0 It is immediately apparent that ΦX (t) = E[tX ] and that, by virtue of (30.70), ΦX (1) = 1. Probably the simplest example of a probability generating function (PGF) is provided by the random variable X defined by # 1 if the outcome of a single trial is a ‘success’, X= 0 if the trial ends in ‘failure’. If the probability of success is p and that of failure q (= 1 − p) then ΦX (t) = qt0 + pt1 + 0 + 0 + · · · = q + pt. (30.72) This type of random variable is discussed much more fully in subsection 30.8.1. In a similar but slightly more complicated way, a Poisson-distributed integer variable with mean λ (see subsection 30.8.4) has a PGF ΦX (t) = ∞ e−λ λn n=0 n! tn = e−λ eλt . (30.73) We note that, as required, ΦX (1) = 1 in both cases. Useful results will be obtained from this kind of approach only if the summation (30.71) can be carried out explicitly in particular cases and the functions derived from ΦX (t) can be shown to be related to meaningful parameters. Two such relationships can be obtained by differentiating (30.71) with respect to t. Taking the first derivative we find ∞ ∞ n=0 n=0 dΦX (t) nfn tn−1 ⇒ ΦX (1) = nfn = E[X], = dt 1158 (30.74) 30.7 GENERATING FUNCTIONS and differentiating once more we obtain ∞ ∞ d2 ΦX (t) = n(n − 1)fn tn−2 ⇒ ΦX (1) = n(n − 1)fn = E[X(X − 1)]. 2 dt n=0 n=0 (30.75) Equation (30.74) shows that ΦX (1) gives the mean of X. Using both (30.75) and (30.51) allows us to write 2 ΦX (1) + ΦX (1) − ΦX (1) = E[X(X − 1)] + E[X] − (E[X])2 = E X 2 − E[X] + E[X] − (E[X])2 = E X 2 − (E[X])2 = V [X], (30.76) and so express the variance of X in terms of the derivatives of its probability generating function. A random variable X is given by the number of trials needed to obtain a first success when the chance of success at each trial is constant and equal to p. Find the probability generating function for X and use it to determine the mean and variance of X. Clearly, at least one trial is needed, and so f0 = 0. If n (≥ 1) trials are needed for the first success, the first n − 1 trials must have resulted in failure. Thus Pr(X = n) = q n−1 p, n ≥ 1, (30.77) where q = 1 − p is the probability of failure in each individual trial. The corresponding probability generating function is thus ∞ ∞ fn tn = (q n−1 p)tn ΦX (t) = n=0 n=1 ∞ p p qt pt = (qt)n = × = , q n=1 q 1 − qt 1 − qt (30.78) where we have used the result for the sum of a geometric series, given in chapter 4, to obtain a closed-form expression for ΦX (t). Again, as must be the case, ΦX (1) = 1. To find the mean and variance of X we need to evaluate ΦX (1) and ΦX (1). Differentiating (30.78) gives p p 1 ΦX (t) = ⇒ ΦX (1) = 2 = , (1 − qt)2 p p 2pq 2pq 2q ⇒ ΦX (1) = 3 = 2 . ΦX (t) = (1 − qt)3 p p Thus, using (30.74) and (30.76), 1 E[X] = ΦX (1) = , p V [X] = ΦX (1) + ΦX (1) − [ΦX (1)]2 2q 1 q 1 = 2 + − 2 = 2. p p p p A distribution with probabilities of the general form (30.77) is known as a geometric distribution and is discussed in subsection 30.8.2. This form of distribution is common in ‘waiting time’ problems (subsection 30.9.3). 1159 PROBABILITY n r=n r Figure 30.10 The pairs of values of n and r used in the evaluation of ΦX+Y (t). Sums of random variables We now turn to considering the sum of two or more independent random variables, say X and Y , and denote by S2 the random variable S2 = X + Y . If ΦS2 (t) is the PGF for S2 , the coefficient of tn in its expansion is given by the probability that X + Y = n and is thus equal to the sum of the probabilities that X = r and Y = n − r for all values of r in 0 ≤ r ≤ n. Since such outcomes for different values of r are mutually exclusive, we have Pr(X + Y = n) = ∞ Pr(X = r) Pr(Y = n − r). (30.79) r=0 Multiplying both sides of (30.79) by tn and summing over all values of n enables us to express this relationship in terms of probability generating functions as follows: ΦX+Y (t) = ∞ Pr(X + Y = n)tn = n=0 ∞ n Pr(X = r)tr Pr(Y = n − r)tn−r n=0 r=0 = ∞ ∞ Pr(X = r)tr Pr(Y = n − r)tn−r . r=0 n=r The change in summation order is justified by reference to figure 30.10, which illustrates that the summations are over exactly the same pairs of values of n and r, but with the first (inner) summation over the points in a column rather than over the points in a row. Now, setting n = r + s gives the final result, ΦX+Y (t) = ∞ Pr(X = r)tr r=0 ∞ Pr(Y = s)ts s=0 = ΦX (t)ΦY (t), 1160 (30.80) 30.7 GENERATING FUNCTIONS i.e. the PGF of the sum of two independent random variables is equal to the product of their individual PGFs. The same result can be deduced in a less formal way by noting that if X and Y are independent then E tX+Y = E tX E tY . Clearly result (30.80) can be extended to more than two random variables by writing S3 = S2 + Z etc., to give Φ( ni=1 Xi ) (t) = n ΦXi (t), (30.81) i=1 and, further, if all the Xi have the same probability distribution, Φ( ni=1 Xi ) (t) = [ΦX (t)]n . (30.82) This latter result has immediate application in the deduction of the PGF for the binomial distribution from that for a single trial, equation (30.72). Variable-length sums of random variables As a final result in the theory of probability generating functions we show how to calculate the PGF for a sum of N random variables, all with the same probability distribution, when the value of N is itself a random variable but one with a known probability distribution. In symbols, we wish to find the distribution of SN = X1 + X2 + · · · + XN , (30.83) where N is a random variable with Pr(N = n) = hn and PGF χN (t) = n hn t . The probability ξk that SN = k is given by a sum of conditional probabilities, namely§ ξk = = ∞ n=0 ∞ Pr(N = n) Pr(X0 + X1 + X2 + · · · + Xn = k) hn × coefficient of tk in [ΦX (t)]n . n=0 Multiplying both sides of this equation by tk and summing over all k, we obtain § Formally X0 = 0 has to be included, since Pr(N = 0) may be non-zero. 1161 PROBABILITY an expression for the PGF ΞS (t) of SN : ΞS (t) = ∞ ξk tk = k=0 = = ∞ k=0 ∞ n=0 ∞ tk hn ∞ n=0 ∞ hn × coefficient of tk in [ΦX (t)]n tk × coefficient of tk in [ΦX (t)]n k=0 hn [ΦX (t)]n n=0 = χN (ΦX (t)). (30.84) In words, the PGF of the sum SN is given by the compound function χN (ΦX (t)) obtained by substituting ΦX (t) for t in the PGF for the number of terms N in the sum. We illustrate this with the following example. The probability distribution for the number of eggs in a clutch is Poisson distributed with mean λ, and the probability that each egg will hatch is p (and is independent of the size of the clutch). Use the results stated in (30.72) and (30.73) to show that the PGF (and hence the probability distribution) for the number of chicks that hatch corresponds to a Poisson distribution having mean λp. The number of chicks that hatch is given by a sum of the form (30.83) in which Xi = 1 if the ith chick hatches and Xi = 0 if it does not. As given by (30.72), ΦX (t) is thus (1−p)+pt. The value of N is given by a Poisson distribution with mean λ; thus, from (30.73), in the terminology of our previous discussion, χN (t) = e−λ eλt . We now substitute these forms into (30.84) to obtain ΞS (t) = exp(−λ) exp[λΦX (t)] = exp(−λ) exp{λ[(1 − p) + pt]} = exp(−λp) exp(λpt). But this is exactly the PGF of a Poisson distribution with mean λp. That this implies that the probability is Poisson distributed is intuitively obvious since, in the expansion of the PGF as a power series in t, every coefficient will be precisely that implied by such a distribution. A solution of the same problem by direct calculation appears in the answer to exercise 30.29. 30.7.2 Moment generating functions As we saw in section 30.5 a probability function is often expressed in terms of its moments. This leads naturally to the second type of generating function, a moment generating function. For a random variable X, and a real number t, the moment generating function (MGF) is defined by # tX etxi f(xi ) for a discrete distribution, = i tx MX (t) = E e e f(x) dx for a continuous distribution. (30.85) 1162 30.7 GENERATING FUNCTIONS The MGF will exist for all values of t provided that X is bounded and always exists at the point t = 0 where M(0) = E(1) = 1. It will be apparent that the PGF and the MGF for a random variable X are closely related. The former is the expectation of tX whilst the latter is the expectation of etX : ΦX (t) = E tX , MX (t) = E etX . The MGF can thus be obtained from the PGF by replacing t by et , and vice versa. The MGF has more general applicability, however, since it can be used with both continuous and discrete distributions whilst the PGF is restricted to non-negative integer distributions. As its name suggests, the MGF is particularly useful for obtaining the moments of a distribution, as is easily seen by noting that t2 X 2 + ··· E etX = E 1 + tX + 2! t2 = 1 + E[X]t + E X 2 + ··· . 2! Assuming that the MGF exists for all t around the point t = 0, we can deduce that the moments of a distribution are given in terms of its MGF by dn MX (t) E[X n ] = . (30.86) dtn t=0 Similarly, by substitution in (30.51), the variance of the distribution is given by 2 (30.87) V [X] = MX (0) − MX (0) , where the prime denotes differentiation with respect to t. The MGF for the Gaussian distribution (see the end of subsection 30.9.1) is given by MX (t) = exp µt + 12 σ 2 t2 . Find the expectation and variance of this distribution. Using (30.86), MX (t) = µ + σ 2 t exp µt + 12 σ 2 t2 MX (t) = σ 2 + (µ + σ 2 t)2 exp µt + 12 σ 2 t2 ⇒ E[X] = MX (0) = µ, ⇒ MX (0) = σ 2 + µ2 . Thus, using (30.87), V [X] = σ 2 + µ2 − µ2 = σ 2 . That the mean is found to be µ and the variance σ 2 justifies the use of these symbols in the Gaussian distribution. The moment generating function has several useful properties that follow from its definition and can be employed in simplifying calculations. 1163 PROBABILITY Scaling and shifting If Y = aX + b, where a and b are arbitrary constants, then MY (t) = E etY = E et(aX+b) = ebt E eatX = ebt MX (at). (30.88) This result is often useful for obtaining the central moments of a distribution. If the MFG of X is MX (t) then the variable Y = X−µ has the MGF MY (t) = e−µt MX (t), which clearly generates the central moments of X, i.e. n d −µt [e M (t)] . E[(X − µ)n ] = E[Y n ] = MY(n) (0) = X dtn t=0 Sums of random variables If X1 , X2 , . . . , XN are independent random variables and SN = X1 + X2 + · · · + XN then N t(X1 +X2 +···+XN ) tSN tXi =E e =E e MSN (t) = E e . i=1 Since the Xi are independent, MSN (t) = N N E etXi = MXi (t). i=1 (30.89) i=1 In words, the MGF of the sum of N independent random variables is the product of their individual MGFs. By combining (30.89) with (30.88), we obtain the more general result that the MGF of SN = c1 X1 + c2 X2 + · · · + cN XN (where the ci are constants) is given by MSN (t) = N MXi (ci t). (30.90) i=1 Variable-length sums of random variables Let us consider the sum of N independent random variables Xi (i = 1, 2, . . . , N), all with the same probability distribution, and let us suppose that N is itself a random variable with a known distribution. Following the notation of section 30.7.1, SN = X1 + X2 + · · · + XN , where N is a random variable with Pr(N = n) = hn and probability generating n hn t . For definiteness, let us assume that the Xi are continuous function χN (t) = RVs (an analogous discussion can be given in the discrete case). Thus, the 1164 30.7 GENERATING FUNCTIONS probability that value of SN lies in the interval s to s + ds is given by§ Pr(s < SN ≤ s + ds) = ∞ Pr(N = n) Pr(s < X0 + X1 + X2 · · · + Xn ≤ s + ds). n=0 Write Pr(s < SN ≤ s + ds) as fN (s) ds and Pr(s < X0 + X1 + X2 · · · + Xn ≤ s + ds) as fn (s) ds. The kth moment of the PDF fN (s) is given by ∞ µk = sk fN (s) ds = sk Pr(N = n)fn (s) ds n=0 = ∞ Pr(N = n) sk fn (s) ds n=0 = ∞ hn × (k!× coefficient of tk in [MX (t)]n ) n=0 Thus the MGF of SN is given by MSN (t) = ∞ µk k=0 k! tk = = ∞ n=0 ∞ hn ∞ tk × coefficient of tk in [MX (t)]n k=0 hn [MX (t)]n n=0 = χN (MX (t)). In words, the MGF of the sum SN is given by the compound function χN (MX (t)) obtained by substituting MX (t) for t in the PGF for the number of terms N in the sum. Uniqueness If the MGF of the random variable X1 is identical to that for X2 then the probability distributions of X1 and X2 are identical. This is intuitively reasonable although a rigorous proof is complicated,¶ and beyond the scope of this book. 30.7.3 Characteristic function The characteristic function (CF) of a random variable X is defined as # itX eitxj f(xj ) for a discrete distribution, = jitx CX (t) = E e e f(x) dx for a continuous distribution (30.91) § As in the previous section, X0 has to be formally included, since Pr(N = 0) may be non-zero. ¶ See, for example, P. A. Moran, An Introduction to Probability Theory (New York: Oxford Science Publications, 1984). 1165 PROBABILITY so that CX (t) = MX (it), where MX (t) is the MGF of X. Clearly, the characteristic function and the MGF are very closely related and can be used interchangeably. Because of the formal similarity between the definitions of CX (t) and MX (t), the characteristic function possesses analogous properties to those listed in the previous section for the MGF, with only minor modifications. Indeed, by substituting it for t in any of the relations obeyed by the MGF and noting that CX (t) = MX (it), we obtain the corresponding relationship for the characteristic function. Thus, for example, the moments of X are given in terms of the derivatives of CX (t) by E[X n ] = (−i)n CX(n) (0). Similarly, if Y = aX + b then CY (t) = eibt CX (at). Whether to describe a random variable by its characteristic function or by its MGF is partly a matter of personal preference. However, the use of the CF does have some advantages. Most importantly, the replacement of the exponential etX in the definition of the MGF by the complex oscillatory function eitX in the CF means that in the latter we avoid any difficulties associated with convergence of the relevant sum or integral. Furthermore, when X is a continous RV, we see from (30.91) that CX (t) is related to the Fourier transform of the PDF f(x). As a consequence of Fourier’s inversion theorem, we may obtain f(x) from CX (t) by performing the inverse transform ∞ 1 CX (t)e−itx dt. f(x) = 2π −∞ 30.7.4 Cumulant generating function As mentioned at the end of subsection 30.5.5, we may also describe a probability density function f(x) in terms of its cumulants. These quantities may be expressed in terms of the moments of the distribution and are important in sampling theory, which we discuss in the next chapter. The cumulants of a distribution are best defined in terms of its cumulant generating function (CGF), given by KX (t) = ln MX (t) where MX (t) is the MGF of the distribution. If KX (t) is expanded as a power series in t then the kth cumulant κk of f(x) is the coefficient of tk /k!: KX (t) = ln MX (t) ≡ κ1 t + κ2 t2 t3 + κ3 + · · · . 2! 3! (30.92) Since MX (0) = 1, KX (t) contains no constant term. Find all the cumulants of the Gaussian distribution discussed in the previous example. The moment generating function for the Gaussian distribution is MX (t) = exp µt + 12 σ 2 t2 . Thus, the cumulant generating function has the simple form KX (t) = ln MX (t) = µt + 12 σ 2 t2 . 1166 30.7 GENERATING FUNCTIONS Comparing this expression with (30.92), we find that κ1 = µ, κ2 = σ 2 and all other cumulants are equal to zero. We may obtain expressions for the cumulants of a distribution in terms of its moments by differentiating (30.92) with respect to t to give 1 dMX dKX = . dt MX dt Expanding each term as power series in t and cross-multiplying, we obtain t2 t2 t2 1 + µ1 t + µ2 + · · · = µ1 + µ2 t + µ3 + · · · , κ1 + κ2 t + κ3 + · · · 2! 2! 2! and, on equating coefficients of like powers of t on each side, we find µ1 = κ1 , µ2 = κ2 + κ1 µ1 , µ3 = κ3 + 2κ2 µ1 + κ1 µ2 , µ4 = κ4 + 3κ3 µ1 + 3κ2 µ2 + κ1 µ3 , .. . µk = κk + k−1 C1 κk−1 µ1 + · · · + k−1 Cr κk−r µr + · · · + κ1 µk−1 . Solving these equations for the κk , we obtain (for the first four cumulants) κ1 = µ1 , κ2 = µ2 − µ21 = ν2 , κ3 = µ3 − 3µ2 µ1 + 2µ31 = ν3 , κ4 = µ4 − 4µ3 µ1 + 12µ2 µ21 − 3µ22 − 6µ41 = ν4 − 3ν22 . (30.93) Higher-order cumulants may be calculated in the same way but become increasingly lengthy to write out in full. The principal property of cumulants is their additivity, which may be proved by combining (30.92) with (30.90). If X1 , X2 , . . . , XN are independent random variables and KXi (t) for i = 1, 2, . . . , N is the CGF for Xi then the CGF of SN = c1 X1 + c2 X2 + · · · + cN XN (where the ci are constants) is given by KSN (t) = N KXi (ci t). i=1 Cumulants also have the useful property that, under a change of origin X → X + a the first cumulant undergoes the change κ1 → κ1 + a but all higher-order cumulants remain unchanged. Under a change of scale X → bX, cumulant κr undergoes the change κr → br κr . 1167