Comments
Description
Transcript
Exercises
STATISTICS number of degrees of freedom is n = N − M = 8. Setting n = 8 and α = 0.05 in (31.127) we find from table 31.2 that k = 15.51. Hence our rejection region is χ2 (m̂, ĉ) > 15.51. 2 Since above we found χ (m̂, ĉ) = 11.5, we cannot reject the null hypothesis that the underlying model for the data is a straight line y = mx + c. As mentioned above, our analysis is only valid if the function f(x; a) is linear in the parameters a. Nevertheless, it is so convenient that it is sometimes applied in non-linear cases, provided the non-linearity is not too severe. 31.8 Exercises 31.1 31.2 31.3 A group of students uses a pendulum experiment to measure g, the acceleration of free fall, and obtains the following values (in m s−2 ): 9.80, 9.84, 9.72, 9.74, 9.87, 9.77, 9.28, 9.86, 9.81, 9.79, 9.82. What would you give as the best value and standard error for g as measured by the group? Measurements of a certain quantity gave the following values: 296, 316, 307, 278, 312, 317, 314, 307, 313, 306, 320, 309. Within what limits would you say there is a 50% chance that the correct value lies? The following are the values obtained by a class of 14 students when measuring a physical quantity x: 53.8, 53.1, 56.9, 54.7, 58.2, 54.1, 56.4, 54.8, 57.3, 51.0, 55.1, 55.0, 54.2, 56.6. (a) Display these results as a histogram and state what you would give as the best value for x. (b) Without calculation, estimate how much reliance could be placed upon your answer to (a). (c) Databooks give the value of x as 53.6 with negligible error. Are the data obtained by the students in conflict with this? 31.4 Two physical quantities x and y are connected by the equation x y 1/2 = 1/2 , ax + b and measured pairs of values for x and y are as follows: x: y: 31.5 10 409 12 196 16 114 20 94 Determine the best values for a and b by graphical means, and (either by hand or by using a built-in calculator routine) by a least-squares fit to an appropriate straight line. Measured quantities x and y are known to be connected by the formula ax , y= 2 x +b where a and b are constants. Pairs of values obtained experimentally are x: y: 2.0 0.32 3.0 0.29 4.0 0.25 5.0 0.21 6.0 0.18 Use these data to make best estimates of the values of y that would be obtained for (a) x = 7.0, and (b) x = −3.5. As measured by fractional error, which estimate is likely to be the more accurate? 1298 31.8 EXERCISES 31.6 Prove that the sample mean is the best linear unbiased estimator of the population mean µ as follows. n (a) If the real numbers a1 , a2 , . . . , a n satisfy the constraint i=1 ai = C, where C n 2 is a given constant, show that i=1 ai is minimised by ai = C/n for all i. n (b) Consider the linear estimator µ̂ = i=1 ai xi . Impose the conditions (i) that it is unbiased and (ii) that it is as efficient as possible. 31.7 A population contains individuals of k types in equal proportions. A quantity X has mean µi amongst individuals of type i and variance σ 2 , which has the same value for all types. In order to estimate the mean of X over the whole population, two schemes are considered; each involves a total sample size of nk. In the first the sample is drawn randomly from the whole population, whilst in the second (stratified sampling) n individuals are randomly selected from each of the k types. Show that in both cases the estimate has expectation µ= k 1 µi , k i=1 but that the variance of the first scheme exceeds that of the second by an amount k 1 (µi − µ)2 . 2 k n i=1 31.8 Carry through the following proofs of statements made in subsections 31.5.2 and 31.5.3 about the ML estimators τ̂ and λ̂. (a) Find the expectation values of the ML estimators τ̂ and λ̂ given, respectively, in (31.71) and (31.75). Hence verify equations (31.76), which show that, even though an ML estimator is unbiased, it does not follow that functions of it are also unbiased. (b) Show that E[τ̂2 ] = (N+1)τ2 /N and hence prove that τ̂ is a minimum-variance estimator of τ. 31.9 Each of a series of experiments consists of a large, but unknown, number n ( 1) of trials in each of which the probability of success p is the same, but also unknown. In the ith experiment, i = 1, 2, . . . , N, the total number of successes is xi ( 1). Determine the log-likelihood function. Using Stirling’s approximation to ln(n − x), show that d ln(n − x) 1 ≈ + ln(n − x), dn 2(n − x) and hence evaluate ∂(n Cx )/∂n. By finding the (coupled) equations determining the ML estimators p̂ and n̂, show that, to order n−1 , they must satisfy the simultaneous ‘arithmetic’ and ‘geometric’ mean constraints n̂p̂ = N 1 xi N i=1 and (1 − p̂)N = N i=1 1299 1− xi . n̂ STATISTICS 31.10 This exercise is intended to illustrate the dangers of applying formalised estimator techniques to distributions that are not well behaved in a statistical sense. The following are five sets of 10 values, all drawn from the same Cauchy distribution with parameter a. (i) −1.24 −8.32 1.54 −4.75 4.57 2.65 202.76 0.44 −3.33 −7.76 4.81 −1.13 0.07 1.86 0.72 −2.00 −0.15 0.36 0.24 1.59 (ii) (iii) (iv) (v) 1.30 2.62 0.38 4.81 0.86 −17.44 −0.21 3.36 −1.30 0.91 −0.23 −0.79 −2.76 1.14 −3.86 −2.26 −0.58 −2.96 3.05 2.80 2.98 −2.85 −8.82 −0.66 0.30 −8.83 −0.14 5.51 3.99 −6.46 Ignoring the fact that the Cauchy distribution does not have a finite variance (or even a formal mean), show that â, the ML estimator of a, has to satisfy s(â) = 10 i=1 31.11 31.12 1 = 5. (∗) 1 + x2i /â2 Using a programmable calculator, spreadsheet or computer, find the value of â that satisfies (∗) for each of the data sets and compare it with the value a = 1.6 used to generate the data. Form an opinion regarding the variance of the estimator. 1/2 Show further that if it is assumed that (E[â])2 = E[â2 ], then E[â] = ν2 , where ν2 is the second (central) moment of the distribution, which for the Cauchy distribution is infinite! According to a particular theory, two dimensionless quantities X and Y have equal values. Nine measurements of X gave values of 22, 11, 19, 19, 14, 27, 8, 24 and 18, whilst seven measured values of Y were 11, 14, 17, 14, 19, 16 and 14. Assuming that the measurements of both quantities are Gaussian distributed with a common variance, are they consistent with the theory? An alternative theory predicts that Y 2 = π 2 X; are the data consistent with this proposal? On a certain (testing) steeplechase course there are 12 fences to be jumped, and any horse that falls is not allowed to continue in the race. In a season of racing a total of 500 horses started the course and the following numbers fell at each fence: Fence: Falls: 1 62 2 75 3 49 4 29 5 33 6 25 7 30 8 17 9 19 10 11 11 15 12 12 Use this data to determine the overall probability of a horse’s falling at a fence, and test the hypothesis that it is the same for all horses and fences as follows. (a) Draw up a table of the expected number of falls at each fence on the basis of the hypothesis. (b) Consider for each fence i the standardised variable estimated falls − actual falls , standard deviation of estimated falls and use it in an appropriate χ2 test. (c) Show that the data indicates that the odds against all fences being equally testing are about 40 to 1. Identify the fences that are significantly easier or harder than the average. zi = 1300 31.8 EXERCISES 31.13 A similar technique to that employed in exercise 31.12 can be used to test correlations between characteristics of sampled data. To illustrate this consider the following problem. During an investigation into possible links between mathematics and classical music, pupils at a school were asked whether they had preferences (a) between mathematics and english, and (b) between classical and pop music. The results are given below. Classical 23 17 30 Mathematics None English None 13 17 10 Pop 14 36 40 By computing tables of expected numbers, based on the assumption that no correlations exist, and calculating the relevant values of χ2 , determine whether there is any evidence for (a) a link between academic and musical tastes, and (b) a claim that pupils either had preferences in both areas or had no preference. 31.14 You will need to consider the appropriate value for the number of degrees of freedom to use when applying the χ2 test. Three candidates X, Y and Z were standing for election to a vacant seat on their college’s Student Committee. The members of the electorate (current firstyear students, consisting of 150 men and 105 women) were each allowed to cross out the name of the candidate they least wished to be elected, the other two candidates then being credited with one vote each. The following data are known. (a) X received 100 votes from men, whilst Y received 65 votes from women. (b) Z received five more votes from men than X received from women. (c) The total votes cast for X and Y were equal. 31.15 Analyse this data in such a way that a χ2 test can be used to determine whether voting was other than random (i) amongst men and (ii) amongst women. A particle detector consisting of a shielded scintillator is being tested by placing it near a particle source whose intensity can be controlled by the use of absorbers. It might register counts even in the absence of particles from the source because of the cosmic ray background. The number of counts n registered in a fixed time interval as a function of the source strength s is given in as: source strength s: counts n: 0 6 1 11 2 20 3 42 4 44 5 62 6 61 At any given source strength, the number of counts is expected to be Poisson distributed with mean n = a + bs, where a and b are constants. Analyse the data for a fit to this relationship and obtain the best values for a and b together with their standard errors. (a) How well is the cosmic ray background determined? (b) What is the value of the correlation coefficient between a and b? Is this consistent with what would happen if the cosmic ray background were imagined to be negligible? (c) Do the data fit the expected relationship well? Is there any evidence that the reported data ‘are too good a fit’? 1301 STATISTICS 31.16 The function y(x) is known to be a quadratic function of x. The following table gives the measured values and uncorrelated standard errors of y measured at various values of x (in which there is negligible error): x y(x) 1 3.5 ± 0.5 2 2.0 ± 0.5 3 3.0 ± 0.5 4 6.5 ± 1.0 5 10.5 ± 1.0 Construct the response matrix R using as basis functions 1, x, x2 . Calculate the matrix RT N−1 R and show that its inverse, the covariance matrix V, has the form 1 12 592 −9708 1580 −9708 8413 −1461 . V= 9184 1580 −1461 269 31.17 Use this matrix to find the best values, and their uncertainties, for the coefficients of the quadratic form for y(x). The following are the values and standard errors of a physical quantity f(θ) measured at various values of θ (in which there is negligible error): θ f(θ) 0 3.72 ± 0.2 π/6 1.98 ± 0.1 π/4 −0.06 ± 0.1 π/3 −2.05 ± 0.1 θ f(θ) π/2 −2.83 ± 0.2 2π/3 1.15 ± 0.1 3π/4 3.99 ± 0.2 π 9.71 ± 0.4 Theory suggests that f should be of the form a1 + a2 cos θ + a3 cos 2θ. Show that the normal equations for the coefficients ai are 481.3a1 + 158.4a2 − 43.8a3 = 284.7, 158.4a1 + 218.8a2 + 62.1a3 = −31.1, −43.8a1 + 62.1a2 + 131.3a3 = 368.4. (a) If you have matrix inversion routines available on a computer, determine the best values and variances for the coefficients ai and the correlation between the coefficients a1 and a2 . (b) If you have only a calculator available, solve for the values using a Gauss– Seidel iteration and start from the approximate solution a1 = 2, a2 = −2, a3 = 4. 31.18 31.19 31.20 Prove that the expression given for the Student’s t-distribution in equation (31.118) is correctly normalised. Verify that the F-distribution P (F) given explicitly in equation (31.126) is symmetric between the two data samples, i.e. that it retains the same form but with N1 and N2 interchanged, if F is replaced by F = F −1 . Symbolically, if P (F ) is the distribution of F and P (F) = η(F, N1 , N2 ), then P (F ) = η(F , N2 , N1 ). It is claimed that the two following sets of values were obtained (a) by randomly drawing from a normal distribution that is N(0, 1) and then (b) randomly assigning each reading to one of two sets A and B: Set A: Set B: −0.314 0.610 −0.691 0.603 0.482 1.515 −0.551 −1.757 −1.642 −0.537 0.058 −1.736 −0.160 −1.635 0.719 1.224 1.423 1.165 Make tests, including t- and F-tests, to establish whether there is any evidence that either claims is, or both claims are, false. 1302