Comments
Description
Transcript
Important discrete distributions
PROBABILITY Distribution Probability law f(x) binomial n negative binomial r+x−1 geometric q x−1 p hypergeometric (Np)!(Nq)!n!(N−n)! x!(Np−x)!(n−x)!(Nq−n+x)!N! Poisson λx −λ e x! Cx px q n−x Cx p r q x MGF E[X] V [X] (pet + q)n r p 1 − qet pet 1 − qet np npq rq p 1 p rq p2 q p2 N−n npq N−1 np t eλ(e −1) λ λ Table 30.1 Some important discrete probability distributions. 30.8 Important discrete distributions Having discussed some general properties of distributions, we now consider the more important discrete distributions encountered in physical applications. These are discussed in detail below, and summarised for convenience in table 30.1; we refer the reader to the relevant section below for an explanation of the symbols used. 30.8.1 The binomial distribution Perhaps the most important discrete probability distribution is the binomial distribution. This distribution describes processes that consist of a number of independent identical trials with two possible outcomes, A and B = Ā. We may call these outcomes ‘success’ and ‘failure’ respectively. If the probability of a success is Pr(A) = p then the probability of a failure is Pr(B) = q = 1 − p. If we perform n trials then the discrete random variable X = number of times A occurs can take the values 0, 1, 2, . . . , n; its distribution amongst these values is described by the binomial distribution. We now calculate the probability that in n trials we obtain x successes (and so n − x failures). One way of obtaining such a result is to have x successes followed by n−x failures. Since the trials are assumed independent, the probability of this is pp · · · p × qq · · · q = px q n−x . 8 9: ; 8 9: ; x times n − x times This is, however, just one permutation of x successes and n − x failures. The total 1168 30.8 IMPORTANT DISCRETE DISTRIBUTIONS f(x) f(x) n = 5, p = 0.6 n = 5, p = 0.167 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 01 23 4 5 0 x f(x) 01 23 4 5 f(x) n = 10, p = 0.6 n = 10, p = 0.167 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 x 0 1 2 3 4 5 6 7 8 9 10 0 x 0 1 2 3 4 5 6 7 8 9 10 x Figure 30.11 Some typical binomial distributions with various combinations of parameters n and p. number of permutations of n objects, of which x are identical and of type 1 and n − x are identical and of type 2, is given by (30.33) as n! ≡ n Cx . x!(n − x)! Therefore, the total probability of obtaining x successes from n trials is f(x) = Pr(X = x) = n Cx px q n−x = n Cx px (1 − p)n−x , (30.94) which is the binomial probability distribution formula. When a random variable X follows the binomial distribution for n trials, with a probability of success p, we write X ∼ Bin(n, p). Then the random variable X is often referred to as a binomial variate. Some typical binomial distributions are shown in figure 30.11. If a single six-sided die is rolled five times, what is the probability that a six is thrown exactly three times? Here the number of ‘trials’ n = 5, and we are interested in the random variable X = number of sixes thrown. Since the probability of a ‘success’ is p = 16 , the probability of obtaining exactly three sixes in five throws is given by (30.94) as 3 (5−3) 5! 1 5 = 0.032. Pr(X = 3) = 3!(5 − 3)! 6 6 1169 PROBABILITY For evaluating binomial probabilities a useful result is the binomial recurrence formula p n−x Pr(X = x + 1) = Pr(X = x), (30.95) q x+1 which enables successive probabilities Pr(X = x + k), k = 1, 2, . . . , to be calculated once Pr(X = x) is known; it is often quicker to use than (30.94). The random variable X is distributed as X ∼ Bin(3, 12 ). Evaluate the probability function f(x) using the binomial recurrence formula. The probability Pr(X = 0) may be calculated using (30.94) and is 0 1 3 = 18 . Pr(X = 0) = 3 C0 12 2 The ratio p/q = (30.95), we find 1 1 / 2 2 = 1 in this case and so, using the binomial recurrence formula Pr(X = 1) = 1 × 3−0 1 3 × = , 0+1 8 8 Pr(X = 2) = 1 × 3 3−1 3 × = , 1+1 8 8 Pr(X = 3) = 1 × 1 3−2 3 × = , 2+1 8 8 results which may be verified by direct application of (30.94). We note that, as required, the binomial distribution satifies n x=0 f(x) = n n Cx px q n−x = (p + q)n = 1. x=0 Furthermore, from the definitions of E[X] and V [X] for a discrete distribution, we may show that for the binomial distribution E[X] = np and V [X] = npq. The direct summations involved are, however, rather cumbersome and these results are obtained much more simply using the moment generating function. The moment generating function for the binomial distribution To find the MGF for the binomial distribution we consider the binomial random variable X to be the sum of the random variables Xi , i = 1, 2, . . . , n, which are defined by # 1 if a ‘success’ occurs on the ith trial, Xi = 0 if a ‘failure’ occurs on the ith trial. 1170 30.8 IMPORTANT DISCRETE DISTRIBUTIONS Thus Mi (t) = E etXi = e0t × Pr(Xi = 0) + e1t × Pr(Xi = 1) = 1 × q + et × p = pet + q. From (30.89), it follows that the MGF for the binomial distribution is given by M(t) = n Mi (t) = (pet + q)n . (30.96) i=1 We can now use the moment generating function to derive the mean and variance of the binomial distribution. From (30.96) M (t) = npet (pet + q)n−1 , and from (30.86) E[X] = M (0) = np(p + q)n−1 = np, where the last equality follows from p + q = 1. Differentiating with respect to t once more gives M (t) = et (n − 1)np2 (pet + q)n−2 + et np(pet + q)n−1 , and from (30.86) E[X 2 ] = M (0) = n2 p2 − np2 + np. Thus, using (30.87) 2 V [X] = M (0) − M (0) = n2 p2 − np2 + np − n2 p2 = np(1 − p) = npq. Multiple binomial distributions Suppose X and Y are two independent random variables, both of which are described by binomial distributions with a common probability of success p, but with (in general) different numbers of trials n1 and n2 , so that X ∼ Bin(n1 , p) and Y ∼ Bin(n2 , p). Now consider the random variable Z = X + Y . We could calculate the probability distribution of Z directly using (30.60), but it is much easier to use the MGF (30.96). Since X and Y are independent random variables, the MGF MZ (t) of the new variable Z = X + Y is given simply by the product of the individual MGFs MX (t) and MY (t). Thus, we obtain MZ (t) = MX (t)MY (t) = (pet + q)n1 (pet + q)n1 = (pet + q)n1 +n2 , which we recognise as the MGF of Z ∼ Bin(n1 + n2 , p). Hence Z is also described by a binomial distribution. This result may be extended to any number of binomial distributions. If Xi , 1171 PROBABILITY i = 1, 2, . . . , N, is distributed as Xi ∼ Bin(ni , p) then Z = X1 + X2 + · · · + XN is distributed as Z ∼ Bin(n1 + n2 + · · · + nN , p), as would be expected since the result of i ni trials cannot depend on how they are split up. A similar proof is also possible using either the probability or cumulant generating functions. Unfortunately, no equivalent simple result exists for the probability distribution of the difference Z = X − Y of two binomially distributed variables. 30.8.2 The geometric and negative binomial distributions A special case of the binomial distribution occurs when instead of the number of successes we consider the discrete random variable X = number of trials required to obtain the first success. The probability that x trials are required in order to obtain the first success, is simply the probability of obtaining x − 1 failures followed by one success. If the probability of a success on each trial is p, then for x > 0 f(x) = Pr(X = x) = (1 − p)x−1 p = q x−1 p, where q = 1 − p. This distribution is sometimes called the geometric distribution. The probability generating function for this distribution is given in (30.78). By replacing t by et in (30.78) we immediately obtain the MGF of the geometric distribution pet , M(t) = 1 − qet from which its mean and variance are found to be E[X] = 1 , p V [X] = q . p2 Another distribution closely related to the binomial is the negative binomial distribution. This describes the probability distribution of the random variable X = number of failures before the rth success. One way of obtaining x failures before the rth success is to have r − 1 successes followed by x failures followed by the rth success, for which the probability is pp · · · p × qq · · · q × p = pr q x . 8 9: ; 8 9: ; r − 1 times x times However, the first r + x − 1 factors constitute just one permutation of r − 1 successes and x failures. The total number of permutations of these r + x − 1 objects, of which r − 1 are identical and of type 1 and x are identical and of type 1172 30.8 IMPORTANT DISCRETE DISTRIBUTIONS 2, is r+x−1 Cx . Therefore, the total probability of obtaining x failures before the rth success is f(x) = Pr(X = x) = r+x−1 Cx pr q x , which is called the negative binomial distribution (see the related discussion on p. 1137). It is straightforward to show that the MGF of this distribution is r p , M(t) = 1 − qet and that its mean and variance are given by rq rq and V [X] = 2 . E[X] = p p 30.8.3 The hypergeometric distribution In subsection 30.8.1 we saw that the probability of obtaining x successes in n independent trials was given by the binomial distribution. Suppose that these n ‘trials’ actually consist of drawing at random n balls, from a set of N such balls of which M are red and the rest white. Let us consider the random variable X = number of red balls drawn. On the one hand, if the balls are drawn with replacement then the trials are independent and the probability of drawing a red ball is p = M/N each time. Therefore, the probability of drawing x red balls in n trials is given by the binomial distribution as Pr(X = x) = n! px (1 − p)n−x . x!(n − x)! On the other hand, if the balls are drawn without replacement the trials are not independent and the probability of drawing a red ball depends on how many red balls have already been drawn. We can, however, still derive a general formula for the probability of drawing x red balls in n trials, as follows. The number of ways of drawing x red balls from M is M Cx , and the number of ways of drawing n − x white balls from N − M is N−M Cn−x . Therefore, the total number of ways to obtain x red balls in n trials is M Cx N−M Cn−x . However, the total number of ways of drawing n objects from N is simply N Cn . Hence the probability of obtaining x red balls in n trials is M Pr(X = x) = Cx N−M Cn−x NC n = (N − M)! n!(N − n)! M! , x!(M − x)! (n − x)!(N − M − n + x)! N! (30.97) = (Np)!(Nq)! n!(N − n)! , x!(Np − x)!(n − x)!(Nq − n + x)! N! (30.98) 1173 PROBABILITY where in the last line p = M/N and q = 1 − p. This is called the hypergeometric distribution. By performing the relevant summations directly, it may be shown that the hypergeometric distribution has mean E[X] = n M = np N and variance V [X] = nM(N − M)(N − n) N−n = npq. N 2 (N − 1) N−1 In the UK National Lottery each participant chooses six different numbers between 1 and 49. In each weekly draw six numbered winning balls are subsequently drawn. Find the probabilities that a participant chooses 0, 1, 2, 3, 4, 5, 6 winning numbers correctly. The probabilities are given by a hypergeometric distribution with N (the total number of balls) = 49, M (the number of winning balls drawn) = 6, and n (the number of numbers chosen by each participant) = 6. Thus, substituting in (30.97), we find 6 C0 43 C6 6 1 , 2.29 6 1 C2 C4 = Pr(2) = 49 , C6 7.55 6 1 C4 43 C2 = Pr(4) = 49 , C6 1032 Pr(0) = 49 C 6 Pr(6) = C6 43 49 C C1 43 C5 1 , 2.42 6 1 C3 C3 = Pr(3) = 49 , C6 56.6 6 1 C5 43 C1 = Pr(5) = 49 , C6 54 200 = Pr(1) = 6 43 49 C = 6 43 C0 = 6 1 . 13.98 × 106 It can easily be seen that 6 Pr(i) = 0.44 + 0.41 + 0.13 + 0.02 + O(10−3 ) = 1, i=0 as expected. Note that if the number of trials (balls drawn) is small compared with N, M and N − M then not replacing the balls is of little consequence, and we may approximate the hypergeometric distribution by the binomial distribution (with p = M/N); this is much easier to evaluate. 30.8.4 The Poisson distribution We have seen that the binomial distribution describes the number of successful outcomes in a certain number of trials n. The Poisson distribution also describes the probability of obtaining a given number of successes but for situations in which the number of ‘trials’ cannot be enumerated; rather it describes the situation in which discrete events occur in a continuum. Typical examples of 1174 30.8 IMPORTANT DISCRETE DISTRIBUTIONS discrete random variables X described by a Poisson distribution are the number of telephone calls received by a switchboard in a given interval, or the number of stars above a certain brightness in a particular area of the sky. Given a mean rate of occurrence λ of these events in the relevant interval or area, the Poisson distribution gives the probability Pr(X = x) that exactly x events will occur. We may derive the form of the Poisson distribution as the limit of the binomial distribution when the number of trials n → ∞ and the probability of ‘success’ p → 0, in such a way that np = λ remains finite. Thus, in our example of a telephone switchboard, suppose we wish to find the probability that exactly x calls are received during some time interval, given that the mean number of calls in such an interval is λ. Let us begin by dividing the time interval into a large number, n, of equal shorter intervals, in each of which the probability of receiving a call is p. As we let n → ∞ then p → 0, but since we require the mean number of calls in the interval to equal λ, we must have np = λ. The probability of x successes in n trials is given by the binomial formula as Pr(X = x) = n! px (1 − p)n−x . x!(n − x)! (30.99) Now as n → ∞, with x finite, the ratio of the n-dependent factorials in (30.99) behaves asymptotically as a power of n, i.e. lim n→∞ n! = lim n(n − 1)(n − 2) · · · (n − x + 1) ∼ nx . (n − x)! n→∞ Also (1 − p)λ/p e−λ . = x p→0 (1 − p) 1 lim lim(1 − p)n−x = lim n→∞ p→0 Thus, using λ = np, (30.99) tends to the Poisson distribution f(x) = Pr(X = x) = e−λ λx , x! (30.100) which gives the probability of obtaining exactly x calls in the given time interval. As we shall show below, λ is the mean of the distribution. Events following a Poisson distribution are usually said to occur randomly in time. Alternatively we may derive the Poisson distribution directly, without considering a limit of the binomial distribution. Let us again consider our example of a telephone switchboard. Suppose that the probability that x calls have been received in a time interval t is Px (t). If the average number of calls received in a unit time is λ then in a further small time interval ∆t the probability of receiving a call is λ∆t, provided ∆t is short enough that the probability of receiving two or more calls in this small interval is negligible. Similarly the probability of receiving no call during the same small interval is simply 1 − λ∆t. Thus, for x > 0, the probability of receiving exactly x calls in the total interval 1175 PROBABILITY t + ∆t is given by Px (t + ∆t) = Px (t)(1 − λ∆t) + Px−1 (t)λ∆t. Rearranging the equation, dividing through by ∆t and letting ∆t → 0, we obtain the differential recurrence equation dPx (t) = λPx−1 (t) − λPx (t). dt For x = 0 (i.e. no calls received), however, (30.101) simplifies to (30.101) dP0 (t) = −λP0 (t), dt which may be integrated to give P0 (t) = P0 (0)e−λt . But since the probability P0 (0) of receiving no calls in a zero time interval must equal unity, we have P0 (t) = e−λt . This expression for P0 (t) may then be substituted back into (30.101) with x = 1 to obtain a differential equation for P1 (t) that has the solution P1 (t) = λte−λt . We may repeat this process to obtain expressions for P2 (t), P3 (t), . . . , Px (t), and we find (λt)x −λt (30.102) e . Px (t) = x! By setting t = 1 in (30.102), we again obtain the Poisson distribution (30.100) for obtaining exactly x calls in a unit time interval. If a discrete random variable is described by a Poisson distribution of mean λ then we write X ∼ Po(λ). As it must be, the sum of the probabilities is unity: ∞ Pr(X = x) = e−λ x=0 ∞ λx x=0 x! = e−λ eλ = 1. From (30.100) we may also derive the Poisson recurrence formula, Pr(X = x + 1) = λ Pr(X = x) x+1 for x = 0, 1, 2, . . . , (30.103) which enables successive probabilities to be calculated easily once one is known. A person receives on average one e-mail message per half-hour interval. Assuming that the e-mails are received randomly in time, find the probabilities that in any particular hour 0, 1, 2, 3, 4, 5 messages are received. Let X = number of e-mails received per hour. Clearly the mean number of e-mails per hour is two, and so X follows a Poisson distribution with λ = 2, i.e. Pr(X = x) = 2x −2 e . x! Thus Pr(X = 0) = e−2 = 0.135, Pr(X = 1) = 2e−2 = 0.271, Pr(X = 2) = 22 e−2 /2! = 0.271, Pr(X = 3) = 23 e−2 /3! = 0.180, Pr(X = 4) = 24 e−2 /4! = 0.090, Pr(X = 5) = 25 e−2 /5! = 0.036. These results may also be calculated using the recurrence formula (30.103). 1176 30.8 IMPORTANT DISCRETE DISTRIBUTIONS f(x) f(x) λ=1 0.3 λ=2 0.3 0.2 0.2 0.1 0.1 0 0 0 1 2 3 4 5 0 1 2 3 4 5 6 7 x x f(x) λ=5 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 11 x Figure 30.12 Three Poisson distributions for different values of the parameter λ. The above example illustrates the point that a Poisson distribution typically rises and then falls. It either has a maximum when x is equal to the integer part of λ or, if λ happens to be an integer, has equal maximal values at x = λ − 1 and x = λ. The Poisson distribution always has a long ‘tail’ towards higher values of X but the higher the value of the mean the more symmetric the distribution becomes. Typical Poisson distributions are shown in figure 30.12. Using the definitions of mean and variance, we may show that, for the Poisson distribution, E[X] = λ and V [X] = λ. Nevertheless, as in the case of the binomial distribution, performing the relevant summations directly is rather tiresome, and these results are much more easily proved using the MGF. The moment generating function for the Poisson distribution The MGF of the Poisson distribution is given by ∞ ∞ etx e−λ λx (λet )x t t = e−λ = e−λ eλe = eλ(e −1) MX (t) = E etX = x! x! x=0 x=0 (30.104) 1177 PROBABILITY from which we obtain MX (t) = λet eλ(e −1) , t MX (t) = (λ2 e2t + λet )eλ(e −1) . t Thus, the mean and variance of the Poisson distribution are given by E[X] = MX (0) = λ and V [X] = MX (0) − [MX (0)]2 = λ. The Poisson approximation to the binomial distribution Earlier we derived the Poisson distribution as the limit of the binomial distribution when n → ∞ and p → 0 in such a way that np = λ remains finite, where λ is the mean of the Poisson distribution. It is not surprising, therefore, that the Poisson distribution is a very good approximation to the binomial distribution for large n (≥ 50, say) and small p (≤ 0.1, say). Moreover, it is easier to calculate as it involves fewer factorials. In a large batch of light bulbs, the probability that a bulb is defective is 0.5%. For a sample of 200 bulbs taken at random, find the approximate probabilities that 0, 1 and 2 of the bulbs respectively are defective. Let the random variable X = number of defective bulbs in a sample. This is distributed as X ∼ Bin(200, 0.005), implying that λ = np = 1.0. Since n is large and p small, we may approximate the distribution as X ∼ Po(1), giving 1x , x! from which we find Pr(X = 0) ≈ 0.37, Pr(X = 1) ≈ 0.37, Pr(X = 2) ≈ 0.18. For comparison, it may be noted that the exact values calculated from the binomial distribution are identical to those found here to two decimal places. Pr(X = x) ≈ e−1 Multiple Poisson distributions Mirroring our discussion of multiple binomial distributions in subsection 30.8.1, let us suppose X and Y are two independent random variables, both of which are described by Poisson distributions with (in general) different means, so that X ∼ Po(λ1 ) and Y ∼ Po(λ2 ). Now consider the random variable Z = X + Y . We may calculate the probability distribution of Z directly using (30.60), but we may derive the result much more easily by using the moment generating function (or indeed the probability or cumulant generating functions). Since X and Y are independent RVs, the MGF for Z is simply the product of the individual MGFs for X and Y . Thus, from (30.104), MZ (t) = MX (t)MY (t) = eλ1 (e −1) eλ2 (e −1) = e(λ1 +λ2 )(e −1) , t t t which we recognise as the MGF of Z ∼ Po(λ1 + λ2 ). Hence Z is also Poisson distributed and has mean λ1 + λ2 . Unfortunately, no such simple result holds for the difference Z = X − Y of two independent Poisson variates. A closed-form 1178