Preview Extract
Chapter 2
Simple Probability Samples
2.1 (a) yฬU =
98 + 102 + 154 + 133 + 190 + 175
= 142
6
(b) For each plan, we first find the sampling distribution of yฬ.
Plan 1:
Sample number
1
2
3
4
5
6
7
8
P (S)
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
yฬS
147.33
142.33
140.33
135.33
148.67
143.67
141.67
136.67
1
1
1
(i) E[yฬ] = (147.33) + (142.33) + ยท ยท ยท + (136.67) = 142.
8
8
8
1
1
1
(ii) V [yฬ] = (147.33 โ 142)2 + (142.33 โ 142)2 + ยท ยท ยท + (136.67 โ 142)2 = 18.94.
8
8
8
(iii) Bias [yฬ] = E[yฬ] โ yฬU = 142 โ 142 = 0.
(iv) Since Bias [yฬ] = 0, MSE [yฬ] = V [yฬ] = 18.94
Plan 2:
Sample number
1
2
3
P (S)
1/4
1/2
1/4
yฬS
135.33
143.67
147.33
1
1
1
(i) E[yฬ] = (135.33) + (143.67) + (147.33) = 142.5.
4
2
4
5
6
CHAPTER 2. SIMPLE PROBABILITY SAMPLES
(ii)
1
1
1
(135.33 โ 142.5)2 + (143.67 โ 142.5)2 + (147.33 โ 142.5)2
4
2
4
= 12.84 + 0.68 + 5.84
V [yฬ] =
= 19.36.
(iii) Bias [yฬ] = E[yฬ] โ yฬU = 142.5 โ 142 = 0.5.
(iv) MSE [yฬ] = V [yฬ] + (Bias [yฬ])2 = 19.61.
(c) Clearly, Plan 1 is better. It has smaller variance and is unbiased as well.
2.2 (a) Unit 1 appears in samples 1 and 3, so ฯ1 = P (S1 ) + P (S3 ) =
Similarly,
ฯ2 =
ฯ3 =
ฯ4 =
ฯ5 =
ฯ6 =
ฯ7 =
ฯ8 =
Note that
1 3
5
+ =
4 8
8
1 1
3
+ =
8 4
8
1 3 1
5
+ + =
8 8 8
8
1 1
1
+ =
8 8
4
1 1 3
5
+ + =
8 8 8
8
1 1
3
+ =
4 8
8
7
1 1 3 1
+ + + = .
4 8 8 8
8
P8
i=1 ฯi = 4 = n.
(b)
Sample, S
{1, 3, 5, 6}
{2, 3, 7, 8}
{1, 4, 6, 8}
{2, 4, 6, 8}
{4, 5, 7, 8}
P (S)
1/8
1/4
1/8
3/8
1/8
tฬ
38
42
40
42
52
Thus the sampling distribution of tฬ is:
k
38
40
42
52
P (tฬ = k)
1/8
1/8
5/8
1/8
1 1
1
+ = .
8 8
4
7
2.3 No, because thick books have a higher inclusion probability than thin books.
2.4 (a) A total of ( 83 ) = 56 samples are possible, each with probability of selection
1
56 . The R function samplist below will (inefficiently!) generate each of the 56
samples. To find the sampling distribution of yฬ, I used the commands
samplist <- function(popn,sampsize){
popvals <- 1:length(popn)
temp <- comblist(popvals,sampsize)
matrix(popn[t(temp)],nrow=nrow(temp),byrow=T)
}
comblist <- function(popvals, sampsize)
{
popsize popsize)
stop(“sample size cannot exceed population size”)
nvals <- popsize – sampsize + 1
nrows <- prod((popsize – sampsize + 1):popsize)/prod(1:sampsize)
ncols <- sampsize
yy <- matrix(nrow = nrows, ncol = ncols)
if(sampsize == 1) {yy <- popvals}
else {
nvals <- popsize – sampsize + 1
nrows <- prod(nvals:popsize)/prod(1:sampsize)
ncols <- sampsize
yy <- matrix(nrow = nrows, ncol = ncols)
rep1 1) {
for(i in 2:nvals)
rep1[i] <- (rep1[i – 1] * (sampsize + i – 2))/(i – 1)
}
rep1 <- rev(rep1)
yy[, 1] <- rep(popvals[1:nvals], rep1)
for(i in 1:nvals) {
yy[yy[, 1] == popvals[i], 2:ncols] <- Recall(
popvals[(i + 1):popsize], sampsize – 1)
}
}
yy
}
temp1 <-samplist(c(1,2,4,4,7,7,7,8),3)
temp2 Tkโ1 : Ui โ [1, N ], Ui โ
/ {UT1 , . . . , UTkโ1 }}
for k = 2, . . . , n. Then for {x1 , . . . , xn } a set of n distinct elements in {1, . . . , N },
P (S = {x1 , . . . , xn }) = P ({UT1 , . . . , UTn } = {x1 , . . . , xn })
P {UT1 = x1 , . . . , UTn = xn } = E[P {UT1 = x1 , . . . , UTn = xn | T1 , T2 , . . . , Tn }]
ยต ยถยต
ยถยต
ยถ ยต
ยถ
1
1
1
1
=
ยทยทยท
N
N โ1
N โ2
N โn+1
(N โ n)!
=
.
N!
Conditional on the stopping times T1 , . . . , Tn , UT1 is discrete uniform on {1, . . . , N };
(UT2 | T1 , . . . , TN , UT1 ) is discrete uniform on {1, . . . , N } โ {UT1 }, and so on. Since
x1 , . . . , xn are arbitrary,
P (S = {x1 , . . . , xn }) =
1
n!(N โ n)!
= ยกN ยข ,
N!
n
so the procedure results in a simple random sample.
(b) This procedure does not result in a simple random sample. Units starting with
5, 6, or 7 are more likely to be in the sample than units starting with 0 or 1. To see
17
this, letโs look at a simpler case: selecting one number between 1 and 74 using this
procedure.
Let U1 , U2 , . . . be independent random variables, each with a discrete uniform distribution on {0, . . . , 9}. Then the first random number considered in the sequence
is 10U1 + U2 ; if that number is not between 1 and 74, then 10U2 + U3 is considered,
etc. Let
T = min{i : 10Ui + Ui+1 โ [1, 74]}.
Then for x = 10×1 + x2 , x โ [1, 74],
P (S = {x}) = P (10UT + UT +1 = x)
= P (UT = x1 , UT +1 = x2 ).
For part (a), the stopping times were irrelevant for the distribution of UT1 , . . . , UTn ;
here, though, the stopping time makes a difference. One way to have T = 2 is if
10U1 + U2 = 75. In that case, you have rejected the first number solely because the
second digit is too large, but that second digit becomes the first digit of the random
number selected. To see this formally, note that
P (S = {x}) = P (10U1 + U2 = x or {10U1 + U2 โ
/ [1, 74] and 10U2 + U3 = x}
or {10U1 + U2 โ
/ [1, 74]
and
10U2 + U3 โ
/ [1, 74]
and 10U3 + U4 = x} or . . .)
= P (U1 = x1 , U2 = x2 )
ยต tโ1
โ
X
+
P
{Ui > 7 or
t=2
[Ui = 7 and Ui+1 > 4]}
i=1
ยถ
and Ut = x1 and Ut+1 = x2 .
Every term in the series is larger if x1 > 4 than if x1 โค 4.
(c) This method almost works, but not quite. For the first draw, the probability
that 131 (or any number in {1, . . . , 149, 170} is selected is 6/1000; the probability
that 154 (or any number in {150, . . . , 169}) is selected is 5/1000.
(d) This clearly does not produce an SRS, because no odd numbers can be included.
(e) If class sizes are unequal, this procedure does not result in an SRS: students in
smaller classes are more likely to be selected for the sample than are students in
larger classes.
Consider the probability that student j in class i is chosen on the first draw.
P {select student j in class i} = P {select class i}P {select student j | class i}
1
1
.
=
20 number of students in class i
(f) Letโs look at the probability student j in class i is chosen for first unit in the
sample. Let U1 , U2 , . . . be independent discrete uniform {1, . . . , 20} and let V1 , V2 , . . .
18
CHAPTER 2. SIMPLE PROBABILITY SAMPLES
be independent discrete
P20uniform {1, . . . , 40}. Let Mi denote the number of students
in class i, with K = i=1 Mi . Then, because all random variables are independent,
P (student j in class i selected)
= P (U1 = i, V2 = j) + P (U2 = i, V2 = j)P
ยต[
20
ยถ
{U1 = k, V1 > Mk }
k=1
ยฝ
ยพY
ยต[
ยถ
l
20
+ ยท ยท ยท + P Ul+1 = i, Vl+1 = j
P
{Uq = k, Vq > Mk }
q=1
=
l=0
=
=
k=1
+ยทยทยท
ยต[
ยถยธ
โ ยท l
20
1 1 X Y
P
{Uq = k, Vq > Mk }
20 40
q=1
k=1
โ ยทX
20
X
1 40 โ Mk
20
40
l=0 k=1
ยธ
โ ยท
K l
1 X
1โ
800
800
1
800
ยธl
l=0
=
1
1
1
= .
800 1 โ (1 โ K/800)
K
Thus, before duplicates are eliminated, a student has probability 1/K of being
selected on any given draw. The argument in part (a) may then be used to show
that when duplicates are discarded, the resulting sample is an SRS.
2.22 (a) From (2.13),
p
V (yฬ)
CV(yฬ) =
=
E(yฬ)
r
n S
.
1โ โ
N nyฬU
Substituting pฬ for yฬ, and NNโ1 p(1 โ p) for S 2 , we have
s
CV(pฬ) ==
ยณ
n ยด N p(1 โ p)
1โ
=
N (N โ 1)np2
The CV for a sample of size 1 is
2 CV2 /r 2 .
zฮฑ/2
s
N โn1โp
.
N โ 1 np
p
(1 โ p)/p. The sample size in (2.26) will be
(b) I used Excel to calculate these values.
p
0.001
Fixed
4.3
Relative 4264176
0.005
21.2
849420
0.01
42.3
422576
0.05
202.8
81100
0.1
384.2
38416
0.3
896.4
9959.7
p
Fixed
Relative
0.9
384.2
474.3
0.95
202.8
224.7
0.99
42.3
43.1
0.995
21.2
21.4
0.999
4.3
4.3
0.7
896.4
1829.3
0.5
1067.1
4268.4
19
2.23
ยต
ยถยต ยถ
3059 19
300
0
ยต
ยถ
P (no missing data) =
3078
300
(2778)(2777) . . . (2760)
=
(3078)(3077) . . . (3060)
= 0.1416421.
2.24
ยณ
n ยด S2
g(n) = L(n) + C(n) = k 1 โ
+ c0 + c1 n.
N n
dg
kS 2
= โ 2 + c1
dn
n
Setting the derivative equal to 0 and solving for n gives
s
kS 2
.
n=
c1
The sample size, in the decision theoretic approach, should be larger if the cost of a
bad estimate, k, or the variance, S 2 , is larger; the sample size is smaller if the cost
of sampling is larger.
2.25 (a) Skewed, with tail on right.
(b) yฬ = 20.15, s2 = 321.357, SE [yฬ] = 1.63
2.26 In a systematic sample, the population is partitioned into k clusters, each of
size n. One of these clusters is selected with probability 1/k, so ฯi = 1/k for each i.
But many of the samples that could be selected in an SRS cannot be selected in a
systematic sample. For example,
P (Z1 = 1, . . . , Zn = 1) = 0 :
since every kth unit is selected, the sample cannot consist of the first n units in the
population.
2.27 (a)
ยต
P (you are in sample) =
=
=
ยถยต ยถ
99,999,999
1
999
1
ยต
ยถ
100,000,000
1000
99,999,999! 1000! 99,999,000!
999! 99,999,000! 100,000,000!
1000
1
=
.
100,000,000
100,000
20
(b)
CHAPTER 2. SIMPLE PROBABILITY SAMPLES
ยต
P (you are not in any of the 2000 samples) = 1 โ
1
100,000
ยถ2000
= 0.9802
(c) P (you are not in any of x samples) = (1 โ 1/100,000)x . Solving for x in
(1 โ 1/100,000)x = 0.5 gives x log(.99999) = log(0.5), or x = 69314.4. Almost
70,000 samples need to be taken! This problem provides an answer to the common
question, โWhy havenโt I been sampled in a poll?โ
2.28 (a) We can think of drawing a simple random sample with replacement as
performing an experiment n independent times; on each trial, outcome i (for i โ
{1, . . . , N }) occurs with probability pi = 1/N . This describes a multinomial experiment.
We may then use properties of the multinomial distribution to answer parts (b) and
(c):
n
E[Qi ] = npi = ,
N
ยต
ยถ
n
1
V [Qi ] = npi (1 โ pi ) =
1โ
,
N
N
and
n 1
for i 6= j.
Cov [Qi , Qj ] = โnpi pj = โ
NN
(b)
ยทX
ยธ
N
N
N
NX n
E[tฬ] = E
Qi yi =
yi = t.
n
n
N
i=1
(c)
ยต
V [tฬ] =
ยต
=
ยต
=
ยต
=
=
N
n
N
n
N
n
N
n
N
n
V
#
Qi yi
i=1
ยถ2 X
N
N X
yi yj Cov [Qij Qj ]
i=1 j=1
ยถ2 ยฝ X
N
yi2 npi (1 โ pi ) +
i=1
ยถ2 ยฝ
N X
X
ยพ
yi yj (โnpi pj )
i=1 j6=i
ยต
ยถ N
ยพ
N N
n
1 X 2
n 1 XX
1โ
yi โ
yi yj
N
N
NN
i=1
ยฝX
N
ยพ
yi2 โ N yฬU2
i=1
N
P
=
“N
X
ยถ2
i=1
(yi โ yฬU )2
N 2 i=1
.
n
N
i=1 j6=i
21
2.29 We use induction. Clearly, S0 is an SRS of size n from a population of size n.
Now suppose Skโ1 is an SRS of size n from Ukโ1 = {1, 2, . . . , n + k โ 1}, where
k โฅ 1. We wish to show that Sk is an SRS of size n from Uk = {1, 2, . . . , n + k}.
Since Skโ1 is an SRS, we know that
P (Skโ1 ) = ยต
1
n!(k โ 1)!
ยถ=
.
(n + k โ 1)!
n+kโ1
n
Now let Uk โผ Uniform(0, 1), let Vk be discrete uniform (1, . . . , n), and suppose Uk
and Vk are independent. Let A be a subset of size n from Uk . If A does not contain
unit n + k, then A can be achieved as a sample at step k โ 1 and
ยถ
ยต
n
P (Sk = A) = P Skโ1 and Uk >
n+k
k
= P (Skโ1 )
n+k
n!k!
=
.
(n + k)!
If A does contain unit n + k, then the sample at step k โ 1 must contain Akโ1 =
A โ {n + k} plus one other unit among the k units not in Akโ1 .
ยต
ยถ
X
n
P Skโ1 = Akโ1 โช {j} and Uk โค
and Vk = j
P (Sk = A) =
n+k
C
jโUkโ1 โฉAkโ1
n!(k โ 1)!
n 1
(n + k โ 1)! n + k n
n!k!
.
(n + k)!
= k
=
2.30 I always use this activity in my classes. Students generally get estimates of
the total area that are biased upwards for the purposive sample. They think, when
looking at the picture, that they donโt have enough of the big rectangles and so tend
to oversample them. This is also a good activity for reviewing confidence intervals
and other concepts from an introductory statistics class.
22
CHAPTER 2. SIMPLE PROBABILITY SAMPLES
Document Preview (18 of 258 Pages)
User generated content is uploaded by users for the purposes of learning and should be used following SchloarOn's honor code & terms of service.
You are viewing preview pages of the document. Purchase to get full access instantly.
-37%
Sampling: Design and Analysis, 2nd Edition Solution Manual
$18.99 $29.99Save:$11.00(37%)
24/7 Live Chat
Instant Download
100% Confidential
Store
Emma Johnson
0 (0 Reviews)
Best Selling
The World Of Customer Service, 3rd Edition Test Bank
$18.99 $29.99Save:$11.00(37%)
Chemistry: Principles And Reactions, 7th Edition Test Bank
$18.99 $29.99Save:$11.00(37%)
Test Bank for Strategies For Reading Assessment And Instruction: Helping Every Child Succeed, 6th Edition
$18.99 $29.99Save:$11.00(37%)
Test Bank for Hospitality Facilities Management and Design, 4th Edition
$18.99 $29.99Save:$11.00(37%)
Data Structures and Other Objects Using C++ 4th Edition Solution Manual
$18.99 $29.99Save:$11.00(37%)
2023-2024 ATI Pediatrics Proctored Exam with Answers (139 Solved Questions)
$18.99 $29.99Save:$11.00(37%)