Exercises

by taratuta

on 20-01-2017

Category: Documents

>> Downloads: 16

118

views

Report

Comments

Description

Download Exercises

Transcript

Exercises

STATISTICS
number of degrees of freedom is n = N − M = 8. Setting n = 8 and α = 0.05 in (31.127)
we ﬁnd from table 31.2 that k = 15.51. Hence our rejection region is
χ2 (m̂, ĉ) > 15.51.
2
Since above we found χ (m̂, ĉ) = 11.5, we cannot reject the null hypothesis that the
underlying model for the data is a straight line y = mx + c. As mentioned above, our analysis is only valid if the function f(x; a) is linear
in the parameters a. Nevertheless, it is so convenient that it is sometimes applied
in non-linear cases, provided the non-linearity is not too severe.
31.8 Exercises
31.1
31.2
31.3
A group of students uses a pendulum experiment to measure g, the acceleration
of free fall, and obtains the following values (in m s−2 ): 9.80, 9.84, 9.72, 9.74,
9.87, 9.77, 9.28, 9.86, 9.81, 9.79, 9.82. What would you give as the best value and
standard error for g as measured by the group?
Measurements of a certain quantity gave the following values: 296, 316, 307, 278,
312, 317, 314, 307, 313, 306, 320, 309. Within what limits would you say there is
a 50% chance that the correct value lies?
The following are the values obtained by a class of 14 students when measuring
a physical quantity x: 53.8, 53.1, 56.9, 54.7, 58.2, 54.1, 56.4, 54.8, 57.3, 51.0, 55.1,
55.0, 54.2, 56.6.
(a) Display these results as a histogram and state what you would give as the
best value for x.
(b) Without calculation, estimate how much reliance could be placed upon your
answer to (a).
(c) Databooks give the value of x as 53.6 with negligible error. Are the data
obtained by the students in conﬂict with this?
31.4
Two physical quantities x and y are connected by the equation
x
y 1/2 = 1/2
,
ax + b
and measured pairs of values for x and y are as follows:
x:
y:
31.5
10
409
12
196
16
114
20
94
Determine the best values for a and b by graphical means, and (either by hand
or by using a built-in calculator routine) by a least-squares ﬁt to an appropriate
straight line.
Measured quantities x and y are known to be connected by the formula
ax
,
y= 2
x +b
where a and b are constants. Pairs of values obtained experimentally are
x:
y:
2.0
0.32
3.0
0.29
4.0
0.25
5.0
0.21
6.0
0.18
Use these data to make best estimates of the values of y that would be obtained
for (a) x = 7.0, and (b) x = −3.5. As measured by fractional error, which estimate
is likely to be the more accurate?
1298
31.8 EXERCISES
31.6
Prove that the sample mean is the best linear unbiased estimator of the population
mean µ as follows.
n
(a) If the real numbers a1 , a2 , . . . , a
n satisfy the constraint
i=1 ai = C, where C
n
2
is a given constant, show that i=1 ai is minimised by ai = C/n for all i.
n
(b) Consider the linear estimator µ̂ = i=1 ai xi . Impose the conditions (i) that it
is unbiased and (ii) that it is as eﬃcient as possible.
31.7
A population contains individuals of k types in equal proportions. A quantity X
has mean µi amongst individuals of type i and variance σ 2 , which has the same
value for all types. In order to estimate the mean of X over the whole population,
two schemes are considered; each involves a total sample size of nk. In the ﬁrst
the sample is drawn randomly from the whole population, whilst in the second
(stratiﬁed sampling) n individuals are randomly selected from each of the k types.
Show that in both cases the estimate has expectation
µ=
k
1
µi ,
k i=1
but that the variance of the ﬁrst scheme exceeds that of the second by an amount
k
1 (µi − µ)2 .
2
k n i=1
31.8
Carry through the following proofs of statements made in subsections 31.5.2 and
31.5.3 about the ML estimators τ̂ and λ̂.
(a) Find the expectation values of the ML estimators τ̂ and λ̂ given, respectively,
in (31.71) and (31.75). Hence verify equations (31.76), which show that, even
though an ML estimator is unbiased, it does not follow that functions of it
are also unbiased.
(b) Show that E[τ̂2 ] = (N+1)τ2 /N and hence prove that τ̂ is a minimum-variance
estimator of τ.
31.9
Each of a series of experiments consists of a large, but unknown, number n
( 1) of trials in each of which the probability of success p is the same, but also
unknown. In the ith experiment, i = 1, 2, . . . , N, the total number of successes is
xi ( 1). Determine the log-likelihood function.
Using Stirling’s approximation to ln(n − x), show that
d ln(n − x)
1
≈
+ ln(n − x),
dn
2(n − x)
and hence evaluate ∂(n Cx )/∂n.
By ﬁnding the (coupled) equations determining the ML estimators p̂ and n̂,
show that, to order n−1 , they must satisfy the simultaneous ‘arithmetic’ and
‘geometric’ mean constraints
n̂p̂ =
N
1 xi
N i=1
and (1 − p̂)N =
N i=1
1299
1−
xi .
n̂
STATISTICS
31.10
This exercise is intended to illustrate the dangers of applying formalised estimator
techniques to distributions that are not well behaved in a statistical sense.
The following are ﬁve sets of 10 values, all drawn from the same Cauchy
distribution with parameter a.
(i)
−1.24
−8.32
1.54
−4.75
4.57
2.65
202.76
0.44
−3.33
−7.76
4.81
−1.13
0.07
1.86
0.72
−2.00
−0.15
0.36
0.24
1.59
(ii)
(iii)
(iv)
(v)
1.30
2.62
0.38
4.81
0.86
−17.44
−0.21
3.36
−1.30
0.91
−0.23
−0.79
−2.76
1.14
−3.86
−2.26
−0.58
−2.96
3.05
2.80
2.98
−2.85
−8.82
−0.66
0.30
−8.83
−0.14
5.51
3.99
−6.46
Ignoring the fact that the Cauchy distribution does not have a ﬁnite variance (or
even a formal mean), show that â, the ML estimator of a, has to satisfy
s(â) =
10
i=1
31.11
31.12
1
= 5. (∗)
1 + x2i /â2
Using a programmable calculator, spreadsheet or computer, ﬁnd the value of
â that satisﬁes (∗) for each of the data sets and compare it with the value
a = 1.6 used to generate the data. Form an opinion regarding the variance of the
estimator.
1/2
Show further that if it is assumed that (E[â])2 = E[â2 ], then E[â] = ν2 , where
ν2 is the second (central) moment of the distribution, which for the Cauchy
distribution is inﬁnite!
According to a particular theory, two dimensionless quantities X and Y have
equal values. Nine measurements of X gave values of 22, 11, 19, 19, 14, 27, 8,
24 and 18, whilst seven measured values of Y were 11, 14, 17, 14, 19, 16 and
14. Assuming that the measurements of both quantities are Gaussian distributed
with a common variance, are they consistent with the theory? An alternative
theory predicts that Y 2 = π 2 X; are the data consistent with this proposal?
On a certain (testing) steeplechase course there are 12 fences to be jumped, and
any horse that falls is not allowed to continue in the race. In a season of racing
a total of 500 horses started the course and the following numbers fell at each
fence:
Fence:
Falls:
1
62
2
75
3
49
4
29
5
33
6
25
7
30
8
17
9
19
10
11
11
15
12
12
Use this data to determine the overall probability of a horse’s falling at a fence,
and test the hypothesis that it is the same for all horses and fences as follows.
(a) Draw up a table of the expected number of falls at each fence on the basis
of the hypothesis.
(b) Consider for each fence i the standardised variable
estimated falls − actual falls
,
standard deviation of estimated falls
and use it in an appropriate χ2 test.
(c) Show that the data indicates that the odds against all fences being equally
testing are about 40 to 1. Identify the fences that are signiﬁcantly easier or
harder than the average.
zi =
1300
31.8 EXERCISES
31.13
A similar technique to that employed in exercise 31.12 can be used to test
correlations between characteristics of sampled data. To illustrate this consider
the following problem.
During an investigation into possible links between mathematics and classical
music, pupils at a school were asked whether they had preferences (a) between
mathematics and english, and (b) between classical and pop music. The results
are given below.
Classical
23
17
30
Mathematics
None
English
None
13
17
10
Pop
14
36
40
By computing tables of expected numbers, based on the assumption that no
correlations exist, and calculating the relevant values of χ2 , determine whether
there is any evidence for
(a) a link between academic and musical tastes, and
(b) a claim that pupils either had preferences in both areas or had no preference.
31.14
You will need to consider the appropriate value for the number of degrees of
freedom to use when applying the χ2 test.
Three candidates X, Y and Z were standing for election to a vacant seat on
their college’s Student Committee. The members of the electorate (current ﬁrstyear students, consisting of 150 men and 105 women) were each allowed to
cross out the name of the candidate they least wished to be elected, the other
two candidates then being credited with one vote each. The following data are
known.
(a) X received 100 votes from men, whilst Y received 65 votes from women.
(b) Z received ﬁve more votes from men than X received from women.
(c) The total votes cast for X and Y were equal.
31.15
Analyse this data in such a way that a χ2 test can be used to determine whether
voting was other than random (i) amongst men and (ii) amongst women.
A particle detector consisting of a shielded scintillator is being tested by placing it
near a particle source whose intensity can be controlled by the use of absorbers.
It might register counts even in the absence of particles from the source because
of the cosmic ray background.
The number of counts n registered in a ﬁxed time interval as a function of the
source strength s is given in as:
source strength s:
counts n:
0
6
1
11
2
20
3
42
4
44
5
62
6
61
At any given source strength, the number of counts is expected to be Poisson
distributed with mean
n = a + bs,
where a and b are constants. Analyse the data for a ﬁt to this relationship and
obtain the best values for a and b together with their standard errors.
(a) How well is the cosmic ray background determined?
(b) What is the value of the correlation coeﬃcient between a and b? Is this
consistent with what would happen if the cosmic ray background were
imagined to be negligible?
(c) Do the data ﬁt the expected relationship well? Is there any evidence that the
reported data ‘are too good a ﬁt’?
1301
STATISTICS
31.16
The function y(x) is known to be a quadratic function of x. The following table
gives the measured values and uncorrelated standard errors of y measured at
various values of x (in which there is negligible error):
x
y(x)
1
3.5 ± 0.5
2
2.0 ± 0.5
3
3.0 ± 0.5
4
6.5 ± 1.0
5
10.5 ± 1.0
Construct the response matrix R using as basis functions 1, x, x2 . Calculate the
matrix RT N−1 R and show that its inverse, the covariance matrix V, has the form


1  12 592 −9708 1580 
−9708 8413
−1461 .
V=
9184
1580 −1461
269
31.17
Use this matrix to ﬁnd the best values, and their uncertainties, for the coeﬃcients
of the quadratic form for y(x).
The following are the values and standard errors of a physical quantity f(θ)
measured at various values of θ (in which there is negligible error):
θ
f(θ)
0
3.72 ± 0.2
π/6
1.98 ± 0.1
π/4
−0.06 ± 0.1
π/3
−2.05 ± 0.1
θ
f(θ)
π/2
−2.83 ± 0.2
2π/3
1.15 ± 0.1
3π/4
3.99 ± 0.2
π
9.71 ± 0.4
Theory suggests that f should be of the form a1 + a2 cos θ + a3 cos 2θ. Show that
the normal equations for the coeﬃcients ai are
481.3a1 + 158.4a2 − 43.8a3 = 284.7,
158.4a1 + 218.8a2 + 62.1a3 = −31.1,
−43.8a1 + 62.1a2 + 131.3a3 = 368.4.
(a) If you have matrix inversion routines available on a computer, determine the
best values and variances for the coeﬃcients ai and the correlation between
the coeﬃcients a1 and a2 .
(b) If you have only a calculator available, solve for the values using a Gauss–
Seidel iteration and start from the approximate solution a1 = 2, a2 = −2,
a3 = 4.
31.18
31.19
31.20
Prove that the expression given for the Student’s t-distribution in equation (31.118)
is correctly normalised.
Verify that the F-distribution P (F) given explicitly in equation (31.126) is symmetric between the two data samples, i.e. that it retains the same form but with N1
and N2 interchanged, if F is replaced by F = F −1 . Symbolically, if P (F ) is the
distribution of F and P (F) = η(F, N1 , N2 ), then P (F ) = η(F , N2 , N1 ).
It is claimed that the two following sets of values were obtained (a) by randomly drawing from a normal distribution that is N(0, 1) and then (b) randomly
assigning each reading to one of two sets A and B:
Set A:
Set B:
−0.314
0.610
−0.691
0.603
0.482
1.515
−0.551
−1.757
−1.642
−0.537
0.058
−1.736
−0.160
−1.635
0.719
1.224
1.423
1.165
Make tests, including t- and F-tests, to establish whether there is any evidence
that either claims is, or both claims are, false.
1302