Solution Manual for Fundamentals of Biostatistics, 8th Edition
Preview Extract
Complete Solutions Manual
to Accompany
Fundamentals of Biostatistics
ยฉ Cengage Learning. All rights reserved. No distribution allowed without express authorization.
EIGHTH EDITION
Bernard Rosner
Harvard University,
Cambridge, MA
Prepared by
Roland A. Matsouaka
Duke University, Durham, NC
Australia โข Brazil โข Mexico โข Singapore โข United Kingdom โข United States
ISBN-13: 978-1-305-26905-7
ISBN-10: 1-305-26905-5
ยฉ 2016 Cengage Learning
ALL RIGHTS RESERVED. No part of this work covered by the
copyright herein may be reproduced, transmitted, stored, or
used in any form or by any means graphic, electronic, or
mechanical, including but not limited to photocopying,
recording, scanning, digitizing, taping, Web distribution,
information networks, or information storage and retrieval
systems, except as permitted under Section 107 or 108 of the
1976 United States Copyright Act, without the prior written
permission of the publisher except as may be permitted by the
license terms below.
For product information and technology assistance, contact us at
Cengage Learning Customer & Sales Support,
1-800-354-9706.
For permission to use material from this text or product, submit
all requests online at www.cengage.com/permissions
Further permissions questions can be emailed to
[email protected].
Cengage Learning
20 Channel Center Street, 4th Floor
Boston, MA 02210
USA
Cengage Learning is a leading provider of customized
learning solutions with office locations around the globe,
including Singapore, the United Kingdom, Australia,
Mexico, Brazil, and Japan. Locate your local office at:
www.cengage.com/global.
Cengage Learning products are represented in
Canada by Nelson Education, Ltd.
To learn more about Cengage Learning Solutions,
visit www.cengage.com.
Purchase any of our products at your local college
store or at our preferred online store
www.cengagebrain.com.
NOTE: UNDER NO CIRCUMSTANCES MAY THIS MATERIAL OR ANY PORTION THEREOF BE SOLD, LICENSED, AUCTIONED,
OR OTHERWISE REDISTRIBUTED EXCEPT AS MAY BE PERMITTED BY THE LICENSE TERMS HEREIN.
READ IMPORTANT LICENSE INFORMATION
Dear Professor or Other Supplement Recipient:
Cengage Learning has provided you with this product (the
โSupplementโ) for your review and, to the extent that you adopt
the associated textbook for use in connection with your course
(the โCourseโ), you and your students who purchase the textbook
may use the Supplement as described below. Cengage Learning
has established these use limitations in response to concerns
raised by authors, professors, and other users regarding the
pedagogical problems stemming from unlimited distribution of
Supplements.
Cengage Learning hereby grants you a nontransferable license to
use the Supplement in connection with the Course, subject to the
following conditions. The Supplement is for your personal,
noncommercial use only and may not be reproduced, posted
electronically or distributed, except that portions of the
Supplement may be provided to your students IN PRINT FORM
ONLY in connection with your instruction of the Course, so long
as such students are advised that they
Printed in the United States of America
1 2 3 4 5 6 7 17 16 15 14 13
may not copy or distribute any portion of the Supplement to any
third party. You may not sell, license, auction, or otherwise
redistribute the Supplement in any form. We ask that you take
reasonable steps to protect the Supplement from unauthorized
use, reproduction, or distribution. Your use of the Supplement
indicates your acceptance of the conditions set forth in this
Agreement. If you do not accept these conditions, you must return
the Supplement unused within 30 days of receipt.
All rights (including without limitation, copyrights, patents, and
trade secrets) in the Supplement are and will remain the sole and
exclusive property of Cengage Learning and/or its licensors. The
Supplement is furnished by Cengage Learning on an โas isโ basis
without any warranties, express or implied. This Agreement will be
governed by and construed pursuant to the laws of the State of
New York, without regard to such Stateโs conflict of law rules.
Thank you for your assistance in helping to safeguard the integrity of
the content contained in this Supplement. We trust you find the
Supplement a useful teaching tool.
Contents
Chapter 2 Descriptive Statistics …………………………………………………………………………………………. 2
Chapter 3 Probability ……………………………………………………………………………………………………… 21
Chapter 4 Discrete Probability Distributions ……………………………………………………………………… 43
Chapter 5 Continuous Probability Distributions …………………………………………………………………. 65
Chapter 6 Estimation………………………………………………………………………………………………………. 93
Chapter 7 Hypothesis Testing: One-Sample Inference ……………………………………………………… 119
Chapter 8 Hypothesis Testing: Two-Sample Inference …………………………………………………….. 146
Chapter 9 Nonparametric Methods …………………………………………………………………………………. 192
Chapter 10 Hypothesis Testing: Categorical Data …………………………………………………………….. 216
Chapter 11 Regression and Correlation Methods ……………………………………………………………… 267
Chapter 12 Multisample Inference ………………………………………………………………………………….. 322
Chapter 13 Design and Analysis Techniques for Epidemiologic Studies ……………………………… 358
Chapter 14 Hypothesis Testing: Person-Time Data ………………………………………………………….. 413
DESCRIPTIVE
STATISTICS
2.1
We have
x๏ฝ
median ๏ฝ
2.2
๏ฅ xi ๏ฝ 215 ๏ฝ 8.6 days
25
n
๏จ n ๏ซ 1๏ฉ
2
th largest observation = 13th largest observation = 8 days
We have that
25
๏ฅ๏จ x ๏ญ x ๏ฉ ๏จ 5 ๏ญ 8.6๏ฉ ๏ซ๏๏ซ ๏จ 4 ๏ญ 8.6๏ฉ 784
s ๏ฝ
๏ฝ
๏ฝ
๏ฝ 32.67
2
2
i
2
2
i๏ฝ1
24
24
24
s ๏ฝ standard deviation = variance ๏ฝ 5.72 days
range ๏ฝ largest ๏ญ smallest observation ๏ฝ 30 ๏ญ 3 ๏ฝ 27 days
2.3
Suppose we divide the patients according to whether or not they received antibiotics, and calculate the
mean and standard deviation for each of the two subsamples:
x
s
n
Antibiotics
11.57
8.81
7
No antibiotics
7.44
3.70
18
Antibiotics – x7
8.50
3.73
6
It appears that antibiotic users stay longer in the hospital. Note that when we remove observation 7, the
two standard deviations are in substantial agreement, and the difference in the means is not that
impressive anymore. This example shows that x and s2 are not robust; that is, their values are easily
affected by outliers, particularly in small samples. Therefore, we would not conclude that hospital stay is
different for antibiotic users vs. non-antibiotic users.
2
CHAPTER 2/DESCRIPTIVE STATISTICS
2.4-2.7
3
Changing the scale by a factor c will multiply each data value xi by c, changing it to cxi . Again the same
individualโs value will be at the median and the same individualโs value will be at the mode, but these
values will be multiplied by c. The geometric mean will be multiplied by c also, as can easily be shown:
Geometric mean ๏ฝ [( cx1 )( cx2 )๏ ( cxn )]1/ n
๏ฝ ( cn x1 ๏ x2 ๏ xn )1/ n
๏ฝ c ( x1 ๏ x2 ๏ xn )1/ n
๏ฝ c ๏ด old geometric mean
The range will also be multiplied by c.
For example, if c ๏ฝ 2 we have:
xi
โ3 โ2 โ1 0 1 2 3
Original Scale
xi
โ6
2.8
โ4
โ2
0
2
4
6
Scale
2
We first read the data file โrunning timeโ in R
> require(xlsx)
> running head(running)
week time
1
1 12.80
2
2 12.20
3
3 12.25
4
4 12.18
5
5 11.53
6
6 12.47
The mean 1-mile running time over 18 weeks is equal to 12.09 minutes:
> mean(running$time)
[1] 12.08889
2.9
The standard deviation is given by
> sd(running$time)
[1] 0.3874181
2.10
Let us first create the variable โtime_100โ and then calculate its mean and standard deviation
> running$time_100=100*running$time
> mean(running$time_100)
[1] 1208.889
> sd(running$time_100)
[1] 38.74181
2.11
Let us to construct the stem-and-leaf plot in R using the stem.leaf command from the package โaplpackโ
> require(aplpack)
CHAPTER 2/DESCRIPTIVE STATISTICS
4
> stem.leaf(running$time_100, unit=1, trim.outliers=FALSE)
1 | 2: represents 12
leaf unit: 1
n: 18
2
115 | 37
3
116 | 7
5
117 | 23
7
118 | 03
8
119 | 2
(1)
120 | 8
9
121 | 8
8
122 | 05
6
123 | 03
4
124 | 7
3
125 | 5
2
126 | 7
127 |
1
128 | 0
Note: one can also use the standard command stem (which does require the โaplpackโ package) to get a similar plot
> stem(running$time_100, scale = 4)
Box plot of running times
2.12
The quantiles of the running times are
12.8
> quantile(running$time)
0%
25%
50%
75%
100%
11.5300 11.7475 12.1300 12.3225 12.8000
12.4
Time
An outlying value is identify has any value x such that
x ๏พ upper quartile+1.5 ๏ด (upper quartile-lower quartile)
๏ฝ 12.32 ๏ซ1.5 ๏ด (12.32 ๏ญ11.75)
๏ฝ 12.32 ๏ซ 0.85 ๏ฝ 13.17
12.6
12.2
12.0
11.8
Since 12.97 minutes is smaller than the largest nonoutlying value
(13.17 minutes), this running time recorded in his first week of
running in the spring is not an outlying value relative to the
distribution of running times recorded the previous year.
2.13
The mean is
x๏ฝ
2.14
11.6
๏ฅ xi ๏ฝ 469 ๏ฝ 19.54 mg dL
24
24
We have that
24
๏ฅ(x ๏ญ x )
2
2
(49 ๏ญ19.54)2 ๏ซ๏๏ซ (12 ๏ญ19.54) 6495.96
๏ฝ
๏ฝ 282.43
23
23
23
s ๏ฝ 282.43 ๏ฝ 16.81 mg/dL
i
s ๏ฝ
2
2.15
i๏ฝ1
๏ฝ
We provide two rows for each stem corresponding to leaves 5-9 and 0-4 respectively. We have
CHAPTER 2/DESCRIPTIVE STATISTICS
Stem-andleaf plot
+4
98
+4
1
+3
65
+3
21
+2
78
+2
13
+1
9699
+1
332
+0
88
+0
2
๏ญ0
8
๏ญ0
03
๏ญ1
5
Cumulative
frequency
24
22
21
19
17
15
13
9
6
4
3
2
2.16
We wish to compute the average of the (24/2)th and (24/2 + 1)th largest values ๏ฝ average of the 12th
and 13th largest points. We note from the stem-and-leaf plot that the 13th largest point counting from the
bottom is the largest value in the upper ๏ซ1 row ๏ฝ 19. The 12th largest point ๏ฝ the next largest value in this
19 ๏ซ 19
๏ฝ 19 mg dL .
row ๏ฝ 19. Thus, the median ๏ฝ
2
2.17
We first must compute the upper and lower quartiles. Because 24๏จ75 100๏ฉ ๏ฝ 18 is an integer, the upper
32 ๏ซ 31
๏ฝ 31. 5 . Similarly, because
quartile = average of the 18th and 19th largest values ๏ฝ
2
24๏จ25 100๏ฉ ๏ฝ 6 is an integer, the lower quartile ๏ฝ average of the 6th and 7th smallest
8 ๏ซ 12
points ๏ฝ
๏ฝ 10 .
2
Second, we identify outlying values. An outlying value is identified as any value x such that
x ๏พ upper quartile ๏ซ 1. 5 ๏ด ( upper quartile ๏ญ lower quartile)
๏ฝ 31. 5 ๏ซ 1. 5 ๏ด (31. 5 ๏ญ 10)
๏ฝ 31. 5 ๏ซ 32.25 ๏ฝ 63.75
or
x ๏ผ lower quartile ๏ญ 15
. ๏ด ( upper quartile ๏ญ lower quartile)
๏ฝ 10 ๏ญ 1.5 ๏ด (315
. ๏ญ 10)
๏ฝ 10 ๏ญ 32.25 ๏ฝ ๏ญ22.25
From the stem-and-leaf plot, we note that the range is from ๏ญ13 to ๏ซ49. Therefore, there are no outlying
values. Thus, the box plot is as follows:
Stem-andleaf plot
+4 98
+4 1
+3 65
+3 21
+2 78
+2 13
+1 9699
+1 332
+0 88
+0 2
๏ญ0
๏ญ0 8
๏ญ1 03
Cumulative
frequency
24
22
21
19
17
15
13
9
6
4
3
2
Box plot
|
|
|
๏ซ๏ญ๏ญ๏ญ๏ญ๏ญ๏ซ
|
|
|
|
๏ช๏ญ๏ญ๏ซ๏ญ๏ญ๏ช
๏ซ๏ญ๏ญ๏ญ๏ญ๏ญ๏ซ
|
|
|
|
|
CHAPTER 2/DESCRIPTIVE STATISTICS
6
Comments: The distribution is reasonably symmetric, since the mean ๏ฝ 19.54 mg dL ๏ฝ๏ฆ 19 mg dL ๏ฝ
median. This is also manifested by the percentiles of the distribution since the upper quartile
๏ญ median ๏ฝ 31. 5 ๏ญ 19 ๏ฝ 12. 5 ๏ฝ๏ฆ median ๏ญ lower quartile ๏ฝ 19 ๏ญ 10 ๏ฝ 9 . The box plot looks deceptively
asymmetric, since 19 is the highest value in the upper ๏ซ1 row and 10 is the lowest value in the lower ๏ซ1
row.
2.18
To compute the median cholesterol level, we construct a stem-and-leaf plot of the before-cholesterol
measurements as follows.
Stem-andleaf plot
25
0
24
4
23
68
22
42
21
20
5
19
5277
18
0
17
8
16
698871
15
981
14
5
13
7
Cumulative
frequency
24
23
22
20
18
17
13
12
11
5
2
1
Based on the cumulative frequency column, we see that the median ๏ฝ average of the 12th and 13th largest
178 ๏ซ 180
๏ฝ 179 mg/dL. Therefore, we look at the change scores among persons with baseline
values ๏ฝ
2
cholesterol ๏ณ 179 mg/dL and < 179 mg/dL, respectively. A stem-and-leaf plot of the change scores in
these two groups is given as follows:
Baseline
๏ณ 179 mg/dL
Stem-andleaf plot
+4
98
+4
+3
65
+3
2
+2
78
+2
1
+1
699
+1
+0
8
+0
๏ญ0
๏ญ0
๏ญ1
Baseline
< 179 mg/dL
Stem-andleaf plot
+4
+4
1
+3
+3
1
+2
+2
3
+1
9
+1
332
+0
8
+0
2
๏ญ0
8
๏ญ0
03
๏ญ1
Clearly, from the plot, the effect of diet on cholesterol is much greater among individuals who start with
relatively high cholesterol levels ( ๏ณ 179 mg/dL) versus those who start with relatively low levels
(< 179 mg/dL). This is also evidenced by the mean change in cholesterol levels in the two groups, which
is 28.2 mg/dL in the ๏ณ 179 mg/dL group and 10.9 mg/dL in the < 179 mg/dL group. We will be
discussing the formal statistical methods for comparing mean changes in two groups in our work on twosample inference in Chapter 8.
CHAPTER 2/DESCRIPTIVE STATISTICS
2.19
7
We first calculate the difference scores between the two positions:
Subject
number
Subject
B.R.A.
J.A.B.
F.L.B.
V.P.B.
M.F.B.
E.H.B.
G.C.
M.M.C.
T.J.F.
R.R.F.
Systolic
difference
score
๏ ๏ญ6
+2
+6
+8
+8
+12
+10
0
๏ ๏ญ2
+4
Diastolic
difference
score
๏ ๏ญ8
๏ ๏ญ2
+4
๏ ๏ญ4
+2
+4
0
๏ ๏ญ2
๏ ๏ญ8
๏ ๏ญ2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
C.R.F.
E.W.G.
T.F.H.
E.J.H.
H.B.H.
R.T.K.
W.E.L.
R.L.L.
H.S.M.
V.J.M.
+8
+14
+2
+6
+26
+8
+10
+12
+14
๏ ๏ญ8
๏ ๏ญ2
+4
๏ญ14
๏ ๏ญ2
0
+8
+4
+2
+8
๏ ๏ญ2
21
22
23
24
25
26
27
28
29
30
31
32
R.H.P.
R.C.R.
J.A.R.
A.K.R.
T.H.S.
O.E.S.
R.E.S.
E.C.T.
J.H.T.
F.P.V.
P.F.W.
W.J.W.
+10
+14
+14
+4
+6
+16
+28
+18
+14
+4
+12
+8
+14
+4
0
+4
+4
+2
+16
๏ ๏ญ4
+4
๏ ๏ญ6
+6
๏ ๏ญ4
Second, we calculate the mean difference scores:
๏ญ6 ๏ซ ๏ ๏ซ 8 282
๏ฝ
๏ฝ 8.8 mm Hg
xsys ๏ฝ
32
32
๏ญ8 ๏ซ ๏ ๏ซ ๏จ ๏ญ4 ๏ฉ 30
xdias ๏ฝ
๏ฝ
๏ฝ 0.9 mm Hg
32
32
The median difference scores are given by the average of the 16th and 17th largest values. Thus,
8๏ซ8
mediansys ๏ฝ
๏ฝ 8 mm Hg
2
0๏ซ2
median dias ๏ฝ
๏ฝ 1 mm Hg
2
CHAPTER 2/DESCRIPTIVE STATISTICS
2.20
8
The stem-and-leaf and box plots allowing two rows for each stem are given as follows:
Systolic Blood Pressure
Stem-andleaf plot
2 68
2
1 68
1 20402404442
0 68886868
0 204244
๏ญ0 2
๏ญ0 68
Cumulative
frequency
32
30
28
17
9
3
2
Box plot
|
|
|
๏ซ๏ญ๏ญ๏ญ๏ญ๏ญ๏ซ
๏ช๏ญ๏ญ๏ซ๏ญ๏ญ๏ช
๏ซ๏ญ๏ญ๏ญ๏ญ๏ญ๏ซ
|
|
14 ๏ซ 14
4๏ซ4
๏ฝ 14, lower quartile ๏ฝ
๏ฝ 4, outlying values:
2
2
x ๏พ 14 ๏ซ 15
. ๏ด (14 ๏ญ 4) ๏ฝ 29 or x ๏ผ 4 ๏ญ 15
. ๏ด (14 ๏ญ 4) ๏ฝ ๏ญ11. Since the range of values is from โ8 to +28,
there are no outlying values for systolic blood pressure.
Median ๏ฝ 8, upper quartile ๏ฝ
Diastolic Blood Pressure
Stem-andleaf plot
1 6
1 4
0 886
0 42404042404424
๏ญ0 242222244
๏ญ0 886
๏ญ1 4
Cumulative
frequency
32
31
30
27
13
4
1
Box plot
0
0
|
๏ซ๏ญ๏ญ+๏ญ๏ญ๏ซ
๏ซ๏ญ๏ญ๏ญ๏ญ๏ญ๏ซ
|
0
๏ญ2 ๏ญ 2
4๏ซ4
๏ฝ 4, lower quartile ๏ฝ
๏ฝ ๏ญ2, outlying values:
2
2
x ๏พ 4 ๏ซ 1. 5 ๏ด (4 ๏ซ 2) ๏ฝ 13.0 or x ๏ผ ๏ญ2 ๏ญ 1. 5 ๏ด (4 ๏ซ 2) ๏ฝ ๏ญ11.0 . The values +16, +14 and โ14 are outlying
values.
Median ๏ฝ 1, upper quartile ๏ฝ
2.21
Systolic blood pressure clearly seems to be higher in the supine (recumbent) position than in the standing
position. Diastolic blood pressure appears to be comparable in the two positions. The distributions are
each reasonably symmetric.
2.22
The upper and lower deciles for postural change in systolic blood pressure (SBP) are 14 and 0. Thus, the
normal range for postural change in SBP is 0 ๏ฃ x ๏ฃ 14 . The upper and lower deciles for postural change
in diastolic blood pressure (DBP) are 8 and โ6. Thus, the normal range for postural change in DBP is
๏ญ6 ๏ฃ x ๏ฃ 8 .
2.23
Id
301
451
……
61951
63241
71141
71142
73041
73042
73751
Age
9
8
FEV
1.708
1.724
Hgt
57
67.5
Sex
0
0
Smoke
0
0
15
16
17
16
16
15
18
2.278
4.504
5.638
4.872
4.27
3.727
2.853
60
72
70
72
67
68
60
0
1
1
1
1
1
0
1
0
0
1
1
1
0
CHAPTER 2/DESCRIPTIVE STATISTICS
75852
77151
MEAN
MEDIAN
SD
9
16
15
2.795
3.211
63
66.5
0
0
1
0
9.931193
10
2.953935
2.63678
2.5475
0.867059
61.14358
61.5
5.703513
0.513761
0.099388
Histogram of Age
Boxplot of FEV
6
90
80
5
60
4
50
FEV
Frequency
70
40
3
30
20
2
10
0
3
6
9
12
15
1
18
Age
Boxplot of Hgt
Chart of Sex
Chart of Smoke
350
75
600
300
70
500
250
400
60
200
Count
Count
Hgt
65
150
55
300
200
100
100
50
50
0
0
Sex
45
2.24
1
Age
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Mean
1.0720
1.316
1.3599
1.6477
1.8330
2.1490
2.3753
2.6814
2.8482
2.9481
3.0656
2.962
2.761
3.058
3.5000
2.9470
3.4320
StDev
*
0.290
0.2513
0.2182
0.3136
0.4046
0.4407
0.4304
0.4293
0.3679
0.4321
0.383
0.415
0.397
*
0.1199
0.1230
Minimum
1.0720
0.839
0.7910
1.3380
1.3700
1.2920
1.5910
1.4580
2.0810
2.3470
2.2160
2.236
2.198
2.608
3.5000
2.8530
3.3450
Median
1.0720
1.404
1.3715
1.6720
1.7420
2.1900
2.3810
2.6895
2.8220
2.8890
3.1135
2.997
2.783
2.942
3.5000
2.9060
3.4320
Maximum
1.0720
1.577
1.7040
2.1020
2.5640
2.9930
3.2230
3.4130
3.7740
3.8350
3.8160
3.428
3.330
3.674
3.5000
3.0820
3.5190
StDev
*
0.524
0.2336
0.2304
Minimum
1.4040
0.796
1.3590
1.3380
Median
1.4040
1.004
1.7920
1.6580
Maximum
1.4040
1.789
2.1150
2.2620
Results for Sex = 1
Variable
FEV
Age
3
4
5
6
Mean
1.4040
1.196
1.7447
1.6650
0
1
Smoke
Results for Sex = 0
Variable
FEV
0
CHAPTER 2/DESCRIPTIVE STATISTICS
7
8
9
10
11
12
13
14
15
16
17
18
19
1.9117
2.0756
2.4822
2.6965
3.2304
3.509
4.011
3.931
4.289
4.193
4.410
4.2367
5.1020
0.3594
0.3767
0.5086
0.6020
0.6459
0.871
0.690
0.635
0.644
0.437
1.006
0.1597
*
10
1.1650
1.4290
1.5580
1.6650
1.6940
1.916
2.531
2.276
3.727
3.645
3.082
4.0860
5.1020
1.9050
2.0690
2.4570
2.6080
3.2060
3.530
4.045
3.882
4.279
4.270
4.429
4.2200
5.1020
2.5780
2.9270
3.8420
4.5910
4.6370
5.224
5.083
4.842
5.793
4.872
5.638
4.4040
5.1020
—————————————————————————————————————————–
Results for Sex = 0
Hgt
46.0
46.5
48.0
49.0
50.0
51.0
51.5
52.0
52.5
53.0
53.5
54.0
54.5
55.0
55.5
56.0
56.5
57.0
57.5
58.0
58.5
59.0
59.5
60.0
60.5
61.0
61.5
62.0
62.5
63.0
63.5
64.0
64.5
65.0
65.4
65.5
66.0
66.5
67.0
67.5
68.0
68.5
69.5
71.0
Mean
1.0720
1.1960
1.110
1.4193
1.3378
1.5800
1.474
1.389
1.577
1.6887
1.4150
1.6408
1.7483
1.6313
2.036
1.651
1.7875
1.9037
1.9300
2.1934
1.9440
2.1996
2.517
2.5659
2.5563
2.6981
2.626
2.7861
2.7777
2.7266
2.995
2.9731
2.864
3.090
2.4340
3.154
2.984
3.2843
3.167
2.922
3.214
3.3300
3.8350
2.5380
Age, 0
Age, 1
6
4
2
FEV
Variable
FEV
Scatterplot of FEV vs Age, Hgt
5
10
15
20
5
Hgt, 0
6
10
15
20
Hgt, 1
4
2
50
Panel variable: Sex
60
70
50
60
70
CHAPTER 2/DESCRIPTIVE STATISTICS
Results for Sex = 1
Variable
FEV
Hgt
47.0
48.0
49.5
50.0
50.5
51.0
51.5
52.0
52.5
53.0
53.5
54.0
54.5
55.0
55.5
56.0
56.5
57.0
57.5
58.0
58.5
59.0
59.5
60.0
60.5
61.0
61.5
62.0
62.5
63.0
63.5
64.0
64.5
65.0
65.5
66.0
66.5
67.0
67.5
68.0
68.5
69.0
69.5
70.0
70.5
71.0
71.5
72.0
72.5
73.0
73.5
74.0
Mean
0.981
1.270
1.4250
1.794
1.536
1.683
1.514
1.5915
1.7100
1.6646
1.974
1.7809
1.8380
1.8034
1.8070
2.025
1.879
2.0875
1.829
2.0169
2.131
2.350
2.515
2.279
2.3253
2.4699
2.5410
2.658
2.829
2.877
2.757
2.697
3.100
2.770
3.0343
3.115
3.353
3.779
3.612
3.878
3.872
4.022
3.743
4.197
3.931
4.310
4.7200
4.361
4.2720
5.255
3.6450
4.654
———————————————————————————————————————————
Descriptive Statistics: FEV
Results for Sex = 0
11
CHAPTER 2/DESCRIPTIVE STATISTICS
12
Boxplot of FEV
6
Smoke
0
1
Mean
2.3792
2.9659
5
StDev
0.6393
0.4229
4
FEV
Variable
FEV
Sex
0
1
3
2
Results for Sex = 1
1
Variable
FEV
2.25
Smoke
0
1
Mean
2.7344
3.743
Smoke
Sex
StDev
0.9741
0.889
0
1
0
0
1
1
Looking at the scatterplot of FEV vs. Age, we find that FEV increases with age for both boys and girls, at
approximately the same rate. However, the spread (standard deviation) of FEV values appears to be
higher in male group than in the female group.
Boxplot of Calories
2.26
3000
Mean
14.557
7.898
64.238
15.21
2.470
8.951
1619.9
1371.7
StDev
7.536
9.695
9.894
27.00
6.314
12.255
323.4
482.1
Median
12.000
3.159
63.500
1.00
0.000
4.550
1606.0
1297.6
2500
2000
Data
Variable
Sat. Fat – DR
Sat. Fat – FFQ
Tot. Fat – DR
Tot. Fat – FFQ
Alcohol – DR
Alcohol – FFQ
Calories – DR
Calories – FFQ
1500
1000
500
Calories – DR
Boxplot of Sat. Fat, Tot. Fat, and Alcohol
140
120
100
Data
80
60
40
20
0
Sat. Fat – DR
2.27
Sat. Fat – FFQ
Tot. Fat – DR
Tot. Fat – FFQ
Alcohol – DR
Alcohol – FFQ
Calories – FFQ
CHAPTER 2/DESCRIPTIVE STATISTICS
13
Scatterplot of DR vs. FFQ values
Sat. Fat – DR*Sat. Fat – FFQ
50
Tot. Fat – DR*Tot. Fat – FFQ
120
40
100
30
80
60
20
40
10
0
15
30
45
60
0
Alcohol – DR*Alcohol – FFQ
50
100
150
Calories – DR*Calories – FFQ
2500
48
36
2000
24
1500
12
1000
0
0
15
30
45
60
1000
2000
3000
If FFQ were a perfect substitute for DR, the points would line up in a straight line. If the two were
unrelated, then we would expect to see a random pattern in each panel. The scatterplots shown above seem
to suggest that the DR and FFQ values are not highly related.
2.28
The 5×5 tables below show the number of people classified into a particular combination of quintile
categories. For each table, the rows represent the quintiles of the DR, and the columns represent quintiles of
the FFQ. Overall, we get the same impression that there is weak concordance between the two measures.
However, we do notice that the agreement is greatest for the two measures with regards to alcohol
consumption. Also, we note the relatively high level of agreement at the extremes of each nutrient; for
example, the (1,1) and (5,5) cells generally contain the highest values.
Tabulated statistics: SFDQuin, SFFQuin
Rows: SFDQuin
1
2
3
4
5
All
Columns: SFFQuin
1
2
3
4
5
All
15
10
4
6
0
35
8
6
7
10
3
34
9
6
8
6
6
35
2
8
9
9
7
35
1
5
6
4
18
34
35
35
34
35
34
173
Cell Contents:
Count
Tabulated statistics: TFDQuin, TFFQuin
Rows: TFDQuin
1
2
3
4
5
Columns: TFFQuin
1
2
3
4
5
All
13
9
4
8
1
9
5
10
6
5
8
7
8
3
8
5
10
6
9
5
1
3
6
9
15
36
34
34
35
34
CHAPTER 2/DESCRIPTIVE STATISTICS
All
35
35
34
Cell Contents:
35
34
14
173
Count
Tabulated statistics: AlcDQuin, AlcFQuin
Rows: AlcDQuin
1
2
3
4
5
All
Columns: AlcFQuin
1
2
3
4
5
All
28
6
0
0
0
34
5
23
9
1
0
38
2
6
14
10
0
32
0
0
10
16
8
34
0
0
1
8
26
35
35
35
34
35
34
173
Cell Contents:
Count
Tabulated statistics: CalDQuin, CalFQuin
Rows: CalDQuin
1
2
3
4
5
All
Columns: CalFQuin
1
2
3
4
5
All
10
11
5
4
5
35
11
4
9
8
3
35
8
9
6
7
4
34
4
7
8
6
10
35
2
4
6
10
12
34
35
35
34
35
34
173
2.29
Descriptive Statistics: Total Fat Density DR, Total Fat Density FFQ
Variable
Total Fat Density DR
Total Fat Density FFQ
Mean
38.066
36.855
StDev
4.205
6.729
Median
38.646
36.366
Scatterplot of Total Fat Density DR vs Total Fat Density FFQ
50
0
Total Fat Density DR
40
30
20
10
0
0
0
10
20
30
40
Total Fat Density FFQ
50
60
CHAPTER 2/DESCRIPTIVE STATISTICS
2.30
15
The concordance for the quintiles of nutrient density does appear somewhat stronger than for the
quintiles of raw nutrient data. In the table below, we see that 19+14+10+7+11 = 61 individuals were in
the same quintile on both measures, compared to 50 people in the table from question 2.28.
Tabulated statistics: Dens DR Quin, Dens FFQ Quin
Rows: Dens DR Quin
1
2
3
4
5
All
Columns: Dens FFQ Quin
1
2
3
4
5
All
19
5
4
6
1
35
7
14
8
4
2
35
6
5
10
7
6
34
2
6
6
7
14
35
1
5
6
11
11
34
35
35
34
35
34
173
2.31
We find that exposed children (Lead type = 2) are somewhat younger and more likely to be male (Sex = 1),
compared to unexposed children. The boxplot below shows all three lead types, but we are only interested in types 1
and 2.
Boxplot of Age
Variable
Age
Lead_type
1
2
Mean
893.8
776.3
StDev
360.2
329.5
1600
Median
905.0
753.5
1400
1200
Rows: Lead_type
Age
Tabulated statistics: Lead_type, Sex
Columns: Sex
1000
800
1
2
All
46
58.97
32
41.03
78
100.00
2
17
70.83
7
29.17
24
100.00
2.32
The exposed children have somewhat lower mean and median IQ scores compared to the unexposed
children, but the differences donโt appear to be very large.
1
600
400
200
1
Descriptive Statistics: Iqv, Iqp
2
Lead_type
3
Boxplot of Iqv, Iqp
150
Iqp
Lead_type
1
2
Mean
85.14
84.33
StDev
14.69
10.55
Median
85.00
81.50
1
2
102.71
95.67
16.79
11.34
101.00
97.00
125
Data
Variable
Iqv
100
75
50
Lead_type
2.33
1
2
Iqv
3
1
2
Iqp
3
The coefficient of variation (CV) is given by 100% ๏จ s / x ๏ฉ , where s and x are computed separately for
each subject. We compute x , s , and CV ๏ฝ 100% ๏ด ๏จ s x ๏ฉ separately for each subject using the following
function in R:
CHAPTER 2/DESCRIPTIVE STATISTICS
16
cv_est cv_est(c(2.22, 1.88))
Mean, SD, CV are
[1] 2.0500000 0.2404163 11.7276247
The results are shown in the table below:
APC resistance Coefficient of Variation
Sample
number
1
2
3
4
5
6
7
8
9
10
A
2.22
3.42
3.68
2.64
2.68
3.29
3.85
2.24
3.25
3.3
B
1.88
3.59
3.01
2.37
2.26
3.04
3.57
2.29
3.39
3.16
mean
2.05
3.505
3.345
2.505
2.47
3.165
3.71
2.265
3.32
3.23
sd
0.240
0.120
0.474
0.191
0.297
0.177
0.198
0.035
0.099
0.099
CV
11.7
3.4
14.2
7.6
12.0
5.6
5.3
1.6
3.0
3.1
average CV
6.7
2.34
To obtain the average CV, we average the individual-specific CVโs over the 10. The average CV = 6.7%
which indicates excellent reproducibility.
2.35
We compute the mean and standard deviation of pod weight for both inoculated (I) and uninoculated (U)
plants. The results are given as follows:
mean
sd
n
2.36
I
1.63
0.42
8
U
1.08
0.51
8
We plot the distribution of I and U pod weights using a dot-plot from MINITAB.
โโ โ โโ โโ+ โโ โโ โ โโ โโ+ โโ โโ โ โโ โโ+ โโ โโ โ โโ โโ+ โโ โโ โ โโ โ โ+ โโ โโ โ โโ โโ I
โโ โ โโ โโ+ โโ โโ โ โโ โโ+ โโ โโ โ โโ โโ+ โโ โโ โ โโ โโ+ โโ โโ โ โโ โ โ+ โโ โโ โ โโ โโU
0.70
1.05
1.40
1.75
2.10
2.45
2.37
Although there is some overlap in the distributions, it appears that the I plants tend in have higher pod
weights than the U plants. We will discuss t tests in Chapter 8 to assess whether there are โstatistically
significantโ differences in mean pod weights between the 2 groups.
CHAPTER 2/DESCRIPTIVE STATISTICS
17
2.38-2.40 For lumbar spine bone mineral density, we have the following:
ID
A
B
C
PY Diff
Pack Year Group
1002501
โ0.05
0.785
โ6.36942675
13.75
2
1015401
โ0.12
0.95
โ12.6315789
48
5
1027601
โ0.24
0.63
โ38.0952381
20.5
3
1034301
0.04
0.83
4.81927711
29.75
3
1121202
โ0.19
0.685
โ27.7372263
25
3
1162502
โ0.03
0.845
โ3.55029586
5
1
1188701
โ0.08
0.91
โ8.79120879
42
5
2
1248202
โ0.1
0.71
โ14.084507
15
1268301
0.15
0.905
16.5745856
9.5
1
1269402
โ0.12
0.95
โ12.6315789
39
4
1273101
โ0.1
0.81
โ12.345679
14.5
2
1323501
0.09
0.755
11.9205298
23.25
3
1337102
โ0.08
0.67
โ11.9402985
18.5
2
1467301
โ0.07
0.665
โ10.5263158
39
4
1479401
โ0.03
0.715
โ4.1958042
25.5
3
1494101
0.05
0.735
6.80272109
8
1
1497701
0.04
0.75
5.33333333
10
2
1505502
โ0.04
0.81
โ4.9382716
32
4
1519402
โ0.01
0.645
โ1.5503876
13.2
2
1521701
โ0.06
0.74
โ8.10810811
30
4
1528201
โ0.11
0.695
โ15.8273381
20.25
3
1536201
โ0.05
0.865
โ5.78034682
36.25
4
1536701
0.03
0.635
4.72440945
12
2
1541902
โ0.12
0.98
โ12.244898
11.25
2
1543602
0.03
0.885
3.38983051
8
1
1596702
0.01
0.955
1.04712042
14
2
1597002
0.07
0.705
9.92907801
17.3
2
1597601
0.13
0.775
16.7741935
12
2
1607901
โ0.03
0.485
โ6.18556701
43.2
5
1608801
โ0.21
0.585
โ35.8974359
48
5
1628601
โ0.05
0.795
โ6.28930818
5.35
1
1635901
0.03
0.945
3.17460317
8
1
1637901
โ0.05
0.775
โ6.4516129
6
1
1640701
โ0.01
0.855
โ1.16959064
28
3
1643602
0.11
0.555
19.8198198
64.5
5
1647502
โ0.07
0.545
โ12.8440367
11.3
2
1648701
โ0.08
0.94
โ8.5106383
15.75
2
1657301
โ0.08
0.72
โ11.1111111
21
3
1671001
โ0.07
0.895
โ7.82122905
39
4
1672702
0.1
0.87
11.4942529
18.75
2
2609801
โ0.1
0.9
โ11.1111111
48
5
Mean
-4.9496682
Median
-6.2893082
Sd
12.4834202
CHAPTER 2/DESCRIPTIVE STATISTICS
18
Individual Value Plot of C
Descriptive Statistics: C
10
Mean
1.95
-2.18
-10.17
-8.30
-9.13
StDev
8.26
10.45
16.69
2.89
17.77
Median
3.17
-3.96
-7.65
-7.96
-9.95
0
C
Variable
C
Pack
Year
Group
1
2
3
4
5
20
-10
-20
-30
-40
1
2
3
Pack Year Group
4
5
It appears that the value of C is generally decreasing as the difference in pack-years gets larger. This
suggests that the lumbar spine bone mineral density is smaller in the heavier-smoking twin, which suggests
that tobacco use has a negative relationship with bone mineral density.
2.41-2.43
For femoral neck BMD, we find . . .
A
B
C
โ0.04
0.7
โ5.714285714
โ0.1
0.69
โ14.49275362
0.01
0.635
1.57480315
0.05
0.665
7.518796992
โ0.16
0.62
โ25.80645161
โ0.06
0.53
โ11.32075472
โ0.05
0.805
โ6.211180124
โ0.07
0.525
โ13.33333333
0.12
0.71
16.90140845
โ0.03
0.885
โ3.389830508
Descriptive Statistics: C_Fem
Variable
C_Fem
Pack
Year
Group
1
2
3
4
5
0.72
5.555555556
0.805
โ11.18012422
โฆโฆโฆ
โฆโฆโฆโฆโฆ
โฆโฆโฆโฆโฆโฆ..
0.04
0.44
9.090909091
โ0.05
0.665
โ7.518796992
โ0.03
0.635
โ4.724409449
-10
0.14
0.64
21.875
-20
30
20
C_Fem
10
0
0.12
0.73
16.43835616
-30
โ0.09
0.765
โ11.76470588
-40
Median
-2.941176471
Sd
14.16185979
Median
7.87
3.68
-4.76
-5.36
-8.99
Individual Value Plot of C_Fem
0.04
-0.466252903
StDev
11.38
14.83
11.44
14.05
16.00
40
โ0.09
Mean
Mean
4.68
4.51
-4.78
-3.56
-9.24
1
2
3
Pack Year Group
4
5
We get the same overall impression as before, that BMD decreases as tobacco use increases. The relationship may
be a bit stronger using the femoral neck measurements, as we see a difference of approximately 14 units (4.68 โ (9.24)) in the mean value of C between Pack Year Group 1 (40 py). Using the
lumbar spine data, this difference was approximately 11 units.
2.44-2.46
Using femoral shaft BMD, we find the following:
CHAPTER 2/DESCRIPTIVE STATISTICS
A
B
C
0.04
1.02
3.921568627
0.12
1.05
11.42857143
โ0.19
0.955
โ19.89528796
โ0.09
1.075
โ8.372093023
โ0.18
1.05
โ17.14285714
โ0.07
1.095
โ6.392694064
0.07
1.195
5.857740586
โ0.01
1.045
โ0.956937799
0.08
1.11
7.207207207
โฆโฆโฆโฆ..
โฆโฆโฆโฆโฆ..
โฆโฆโฆโฆโฆโฆ
19
Descriptive Statistics: C_Shaft
Variable
C_Shaft
โ8.547008547
โ7.920792079
10
โ0.03
0.875
โ3.428571429
0
โ0.04
0.68
โ5.882352941
0.1
1.16
8.620689655
โ0.2
1.32
โ15.15151515
โ0.03
1.045
โ2.870813397
-30
โ0.04
1.04
โ3.846153846
-40
0.06
1.28
4.6875
-2.870813397
Sd
11.29830441
C_Shaft
1.17
1.01
Median
StDev
7.67
6.49
9.77
11.03
21.61
Median
-2.74
1.03
-9.40
-3.80
0.63
Individual Value Plot of C_Shaft
โ0.1
-3.241805211
Mean
-0.98
0.25
-8.55
-1.92
-8.26
20
โ0.08
Mean
Pack
Year
Group
1
2
3
4
5
-10
-20
-50
1
2
3
Pack Year Group
4
5
When using the femoral shaft BMD data, the relationship between BMD and tobacco is much less clear. The lowest
mean (and median) C value occurs in group 3, and it is hard to tell if any relationship exists between pack-year
group and C.
2.47
We first read the data set LVM and show its first observations
> require(xlsx)
>lvm head(lvm)
ID lvmht27 bpcat gender
age
BMI
1 1 31.281
1
1 17.63 21.45
2 2 36.780
1
2 16.11 19.78
3 6 20.660
1
2 17.03 20.58
4 10 44.222
1
2 11.50 25.34
5 16 23.302
1
1 11.90 17.30
6 20 27.735
1
2 10.47 19.16
We use the R function tapply to calculate the mean of LVMI by blood pressure group
> tapply(lvm$lvmht27, lvm$bpcat, mean)
1
2
3
29.34266 33.79100 34.11569
2.48
We use also the R function tapply to calculate the geometric mean of LVMI by blood pressure group
> exp(tapply(log(lvm$lvmht27), lvm$bpcat, mean))
1
2
3
28.60586 33.34814 32.88941
CHAPTER 2/DESCRIPTIVE STATISTICS
20
2.49
> boxplot(lvm$lvmht27~lvm$bpcat,
pressure group”)
2.50
Since the box plots by blood pressure group are skewed, the geometric mean provides a more appropriate
measure of location for this type of data.
main=”Box
plot
of
LVMI
by
15
20
25
30
35
40
45
50
Box plot of LVMI by blood pressure group
1
2
3
blood
PROBABILITY
3.1
A1 ๏ A2 means that at least one parent has influenza.
3.2
A1 ๏ A2 means that both parents have influenza.
3.3
No. Both children can have influenza.
3.4
A3 ๏ B means that at least one child has influenza, because if A3 occurs, then B must occur. Therefore,
A3 ๏ B ๏ฝ B .
3.5
A3 ๏ B means that the first child has influenza. Therefore, A3 ๏ B ๏ฝ A3 .
3.6
C ๏ฝ A1 ๏ A2
3.7
D ๏ฝ B๏C
3.8
A1 means that the mother does not have influenza.
3.9
A2 means that the father does not have influenza.
3.10
C ๏ฝ A1 ๏ A2
3.11
D ๏ฝ B๏C
Therefore, the events are not independent.
3.12
3.13
3.14
21
22
CHAPTER 3/PROBABILITY
3.15
3.16
3.17
Let A ๏ฝ {77-year-old man is affected}, B ๏ฝ {76-year-old woman is affected}, C ๏ฝ {82-year-old woman is
affected}. It follows that Pr ๏จ A ๏ B ๏ C ๏ฉ ๏ฝ.049๏ด.023๏ด.078 ๏ฝ 8.8 ๏ด 10 ๏ญ5
We need to compute Pr ๏จ B ๏ C ๏ฉ . From the addition law,
Pr ๏จ B ๏ C ๏ฉ ๏ฝ Pr ๏จ B๏ฉ ๏ซ Pr ๏จ C ๏ฉ ๏ญ Pr ๏จ B ๏ C ๏ฉ ๏ฝ.023๏ซ.078 ๏ญ ๏จ.023๏ด.078๏ฉ ๏ฝ.099
3.18
We wish to compute Pr ๏จ A ๏ B ๏ C ๏ฉ . We have
3.19
We wish to compute
๏จ ๏ฉ
๏จ
. Hence
๏ฉ ๏จ
๏ฉ ๏จ
๏ฉ
๏จ
๏ฉ ๏จ
๏ฉ ๏จ
๏ฉ ๏จ
Pr E ๏ฝ 0.049 1๏ญ 0.023 ๏ด 1๏ญ 0.078 ๏ซ 1๏ญ 0.049 ๏ด 0.023๏ด 1๏ญ 0.078 ๏ซ 1๏ญ 0.049 ๏ด 1๏ญ 0.023 ๏ด 0.078
๏ฉ
๏ฝ 0.0441๏ซ 0.0202 ๏ซ 0.0725 ๏ฝ 0.1368
.0202 ๏ซ.0725 .0926
๏ฝ
๏ฝ.677
.1368
.1368
3.20
We have Pr(affected individual is a woman) ๏ฝ
3.21
We have Pr(both affected individuals are women)
๏ฝ
๏จ1๏ญ.049 ๏ฉ๏จ.023๏ฉ๏จ.078๏ฉ
.049๏จ.023๏ฉ๏จ1๏ญ.078๏ฉ ๏ซ.049๏จ1๏ญ.023๏ฉ๏จ.078๏ฉ ๏ซ ๏จ1๏ญ.049 ๏ฉ๏จ.023๏ฉ๏จ.078 ๏ฉ
.00171
.00171
๏ฝ
๏ฝ
๏ฝ.263
.00104 ๏ซ.00373๏ซ.00171 .00648
3.22
3.23
We have Pr(both ๏ผ 80 years old) ๏ฝ
.049๏จ.023๏ฉ๏จ1๏ญ.078๏ฉ .00104
๏ฝ
๏ฝ.160
.00648
.00648
.0015
๏ฝ.065. It is higher than the value in Table 3.5 (.049),
.023
indicating that these are dependent events.
Pr(man affected ๏ผ woman affected) ๏ฝ
.0015
๏ฝ.031. This value is also higher than the unconditional
.049
probability in Table 3.5 (.023). If there is some common environmental factor that is associated with
Alzheimerโs disease, then it would make sense that the conditional probability is higher than the
unconditional probability.
3.24
Pr(woman affected ๏ผ man affected) ๏ฝ
3.25
Let A ๏ฝ {man affected }, B ๏ฝ {woman affected }. We have
Pr ๏จ A ๏ B๏ฉ ๏ฝ Pr ๏จ A ๏ฉ ๏ซ Pr ๏จ B๏ฉ ๏ญ Pr ๏จ A ๏ B๏ฉ ๏ฝ.049๏ซ.023๏ญ.0015 ๏ฝ.0705
3.26
Let Pr(A) denote the overall probability of Alzheimerโs disease. We have that
Pr ๏จ A๏ฉ ๏ฝ Pr ๏จ A 65 ๏ญ 69 M ๏ฉ ๏ด Pr ๏จ65 ๏ญ 69 M ๏ฉ ๏ซ ๏ ๏ซ Pr ๏จ A 85 ๏ซ F ๏ฉ ๏ด Pr ๏จ85 ๏ซ F ๏ฉ
๏ฝ.05๏ด.016๏ซ.10๏ด.000 ๏ซ ๏ ๏ซ.06๏ด.279 ๏ฝ.061
Therefore, the expected overall prevalence in the community is 6.1%.
3.27
The expected number of cases with Alzheimerโs disease ๏ฝ 1000๏ด.061 ๏ฝ 61.
CHAPTER 3/PROBABILITY
3.28
23
Let A, B, and C represent influenza status for the 3, 5, and 7 year-old, where A=1 if influenza, A=0
otherwise, and B and C are defined similarly.
We wish to compute Pr(A ๏ B ๏C) . However,
Pr(A ๏ B ๏C) ๏ฝ 1๏ญ ๏1๏ญ Pr(A)๏ ๏ด ๏1๏ญ Pr(B)๏ ๏ด ๏1๏ญ Pr(C)๏
๏ฝ 1๏ญ (1๏ญ 0.0378)(1๏ญ 0.0170)2
๏ฝ 1๏ญ 0.9622(0.9830)2
๏ฝ 1๏ญ 09298 ๏ฝ 0.070
Thus, there is a 7% probability that at least one of the three children gets influenza.
3.29
We use the total probability rule.
Let D = 3-4 year-old get influenza. We have:
Pr(D) ๏ฝ 0.0378(0.80) ๏ซ 0.0569(0.20)(2)
๏ฝ 0.0302 ๏ซ 0.0228 ๏ฝ 0.053
Thus, 5.3% of 3-4 year-olds get influenza.
3.30
Let E = 5-8 year-old get influenza. We have:
Pr(E) ๏ฝ 0.0170(0.70) ๏ซ 0.0515(0.30)(2)
๏ฝ 0.0119 ๏ซ 0.0309 ๏ฝ 0.043
Thus, 4.3% of 5-8 year-olds get influenza.
3.31
We use Bayesโ Theorem. Let V = child is vaccinated, and I = child gets influenza. We wish to compute
Pr(V | I ). We have: From table
Pr(I | V )Pr(V )
Pr(V | I ) ๏ฝ
Pr(I | V )Pr(V ) ๏ซ Pr(I | V )Pr(V )
๏จ
๏ฉ
From Table 3.7, and the conditions of the problem, for a 5-8 year-old Pr I |V ๏ฝ 0.0170 and
๏จ
๏ฉ
Pr I |V ๏ฝ 0.0515๏ด 2 ๏ฝ 0.1030 .
Also Pr(V ) ๏ฝ 0.70 and Pr(V ) ๏ฝ 0.30 . Thus,
0.0170(0.70)
Pr(V | I ) ๏ฝ
0.0170(0.70) ๏ซ 0.1030(0.30)
0.0119
0.0119
๏ฝ
๏ฝ
๏ฝ 0.278
0.0119 ๏ซ 0.0309 0.0428
Thus, there is only a 28% probability that this child was vaccinated.
1 1 1
๏ด ๏ฝ
2 2 4
3.32
The probability that both siblings are affected is
3.33
The probability that exactly one sibling is affected is 2 ๏ด
3.34
The probability that neither sibling will be affected is
1 1 1
๏ด ๏ฝ
2 2 2
1 1 1
๏ด ๏ฝ
2 2 4
24
CHAPTER 3/PROBABILITY
3.35
The probability that the younger child is affected should not be influenced by whether or not the older
child is affected. Thus, the probability of the younger child being affected remains at 12 .
3.36
The events A, B are independent because whether or not a child is affected does not influence the
outcome for other children in the family.
3.37
3.38
๏ฆ 1 ๏ถ2 1
The probability that both siblings are affected ๏ฝ ๏ง ๏ท ๏ฝ
๏จ 4 ๏ธ 16
๏ฆ1๏ถ 3 3
The probability that exactly one sibling is affected ๏ฝ 2 ๏ด ๏ง ๏ท ๏ด ๏ฝ
๏จ4๏ธ 4 8
3.39
๏ฆ 3 ๏ถ2 9
The probability that neither sibling is affected ๏ฝ ๏ง ๏ท ๏ฝ
๏จ 4 ๏ธ 16
3.40
The probability that both siblings are affected ๏ฝ 0, because the female sibling cannot get the disease.
3.41
The probability that exactly one sibling is affected ๏ฝ 12 , since only the male sibling can be affected.
3.42
The probability that neither is affected ๏ฝ
3.43
๏ฆ 1 ๏ถ2 1
Pr( both affected ) ๏ฝ ๏ง ๏ท ๏ฝ
๏จ2๏ธ 4
3.44
3.45
3.46
1
1
๏ด1 ๏ฝ
2
2
๏ฆ1๏ถ ๏ฆ1๏ถ 1
Pr(exactly one affected ) ๏ฝ 2 ๏ด ๏ง ๏ท ๏ด ๏ง ๏ท ๏ฝ
๏จ2๏ธ ๏จ2๏ธ 2
๏ฆ1๏ถ ๏ฆ1๏ถ 1
Pr(neither affected ) ๏ฝ ๏ง ๏ท ๏ด ๏ง ๏ท ๏ฝ
๏จ2๏ธ ๏จ2๏ธ 4
Bayesโ theorem is used here. Dominant is denoted by DOM, autosomal recessive by AR, and sex-linked
by SL. Let A be the event that two male siblings are affected. The posterior probability is given by
Pr(DOM ๏ผ A) ๏ฝ
Pr ๏จ A DOM ๏ฉ ๏ด Pr ๏จ DOM ๏ฉ
Pr ๏จ A DOM ๏ฉ Pr ๏จ DOM ๏ฉ ๏ซ Pr ๏จ A AR ๏ฉ Pr ๏จ AR ๏ฉ ๏ซ Pr ๏จ A SL ๏ฉ Pr ๏จSL ๏ฉ
We also know that Pr ๏จ DOM ๏ฉ ๏ฝ Pr ๏จ AR ๏ฉ ๏ฝ Pr ๏จSL ๏ฉ ๏ฝ
Pr ๏จ DOM A๏ฉ ๏ฝ
Pr ๏จ A DOM ๏ฉ
Pr ๏จ A DOM ๏ฉ ๏ซ Pr ๏จ A AR ๏ฉ ๏ซ Pr ๏จ A SL ๏ฉ
1
from the conditions stated in the problem. Thus,
3
Finally, we know from Problems 3.31, 3.36, and 3.42 that
Pr ๏จ A DOM ๏ฉ ๏ฝ
1
4
Pr ๏จ A AR ๏ฉ ๏ฝ
1
16
1
1
4
4
Thus, Pr ๏จ DOM A๏ฉ ๏ฝ
๏ฝ 4 ๏ฝ
9
1 1 1
9
๏ซ ๏ซ
4 16 4 16
Similarly,
Pr ๏จ A SL ๏ฉ ๏ฝ
1
4
CHAPTER 3/PROBABILITY
25
1
Pr ๏จ A AR ๏ฉ
1
16
Pr ๏จ AR A๏ฉ ๏ฝ
๏ฝ
๏ฝ
9
Pr ๏จ A DOM ๏ฉ ๏ซ Pr ๏จ A AR ๏ฉ ๏ซ Pr ๏จ A SL ๏ฉ
9
16
1
Pr ๏จ A SL ๏ฉ
4
Pr ๏จSL A๏ฉ ๏ฝ
๏ฝ 4 ๏ฝ
9
๏จ
๏ฉ
๏จ
๏ฉ
๏จ
๏ฉ
Pr A DOM ๏ซ Pr A AR ๏ซ Pr A SL
9
16
Thus, the dominant and sex-linked modes of inheritance are the most likely, with the autosomal recessive
mode being less likely.
3.47
Let B ๏ฝ {exactly one of two male siblings is affected}. From Problems 3.32, 3.37, and 3.43,
Pr ๏จ B DOM ๏ฉ ๏ฝ
3.48
1
2
Pr ๏จ B AR ๏ฉ ๏ฝ
3
8
Pr ๏จ B SL ๏ฉ ๏ฝ
1
2
Thus, from Bayesโ theorem, the posterior probabilities are given by
1
1
Pr ๏จ B DOM ๏ฉ
4
2
Pr ๏จ DOM B๏ฉ ๏ฝ
๏ฝ
๏ฝ 2 ๏ฝ
Pr ๏จ B DOM ๏ฉ ๏ซ Pr ๏จ B AR ๏ฉ ๏ซ Pr ๏จ B SL ๏ฉ 1 ๏ซ 3 ๏ซ 1 11 11
2 8 2
8
3
Pr ๏จ B AR ๏ฉ
3
Pr ๏จ AR B๏ฉ ๏ฝ
๏ฝ 8 ๏ฝ
Pr ๏จ B DOM ๏ฉ ๏ซ Pr ๏จ B AR ๏ฉ ๏ซ Pr ๏จ B SL ๏ฉ 11 11
8
1
Pr ๏จ B SL ๏ฉ
4
Pr ๏จSL B๏ฉ ๏ฝ
๏ฝ 2 ๏ฝ
11
Pr ๏จ B DOM ๏ฉ ๏ซ Pr ๏จ B AR ๏ฉ ๏ซ Pr ๏จ B SL ๏ฉ
11
8
Here the three genetic types are about equally likely.
Let C ๏ฝ {both one male and one female sibling are affected}. The sex of the siblings is only relevant for
sex-linked disease. Thus, from Problems 3.31, 3.36, and 3.39,
Pr ๏จC DOM ๏ฉ ๏ฝ
1
4
Pr ๏จC AR ๏ฉ ๏ฝ
1
16
Pr ๏จ C SL ๏ฉ ๏ฝ 0
Thus,
1
1
4
Pr ๏จC DOM ๏ฉ
4
Pr ๏จ DOM C ๏ฉ ๏ฝ
๏ฝ
๏ฝ 4 ๏ฝ
1
1
5
5
Pr ๏จC DOM ๏ฉ ๏ซ Pr ๏จC AR ๏ฉ ๏ซ Pr ๏จC SL ๏ฉ
๏ซ
4 16 16
1
Pr ๏จC AR ๏ฉ
1
Pr ๏จ AR C ๏ฉ ๏ฝ
๏ฝ 16 ๏ฝ
5
Pr ๏จC DOM ๏ฉ ๏ซ Pr ๏จC AR ๏ฉ ๏ซ Pr ๏จC SL ๏ฉ
5
16
Pr ๏จSL C ๏ฉ ๏ฝ 0
3.49
Let D ๏ฝ {male sibling affected, female sibling not affected}. Pr ๏จ D DOM ๏ฉ ๏ฝ
Pr ๏จ D AR ๏ฉ ๏ฝ
1 3 3
๏ด ๏ฝ
4 4 16
Pr ๏จ D SL ๏ฉ ๏ฝ
1
1
๏ด1 ๏ฝ
2
2
1 1 1
๏ด ๏ฝ
2 2 4
26
CHAPTER 3/PROBABILITY
Notice that the event D is not the same as the event that exactly one sibling is affected, since we are
specifying which of the two siblings is affected. We have
1
1
4
Pr ๏จ D DOM ๏ฉ
4
Pr ๏จ DOM D๏ฉ ๏ฝ
๏ฝ
๏ฝ 4 ๏ฝ
1
3
1
15
๏จ
๏ฉ
๏จ
๏ฉ
๏จ
๏ฉ
15
Pr D DOM ๏ซ Pr D AR ๏ซ Pr D SL
๏ซ ๏ซ
4 16 2 16
3
3
Pr ๏จ D AR ๏ฉ
1
16
16
Pr ๏จ AR D๏ฉ ๏ฝ
๏ฝ
๏ฝ
Pr ๏จ D DOM ๏ฉ ๏ซ Pr ๏จ D AR ๏ฉ ๏ซ Pr ๏จ D SL ๏ฉ 1 ๏ซ 3 ๏ซ 1 15 5
4 16 2 16
1
8
Pr ๏จ D SL ๏ฉ
Pr ๏จSL D๏ฉ ๏ฝ
๏ฝ 2 ๏ฝ
15
15
Pr ๏จ D DOM ๏ฉ ๏ซ Pr ๏จ D AR ๏ฉ ๏ซ Pr ๏จ D SL ๏ฉ
16
Thus, in this situation the sex-linked mode of inheritance is the most likely.
3.50
3.51
3.52
3.53
Pr (mother current smoker ๏ father current smoker)
๏ฝ Pr (mother current smoker) ๏ด Pr (father current smoker) ๏ฝ0 .4 ๏ด 0.5 ๏ฝ 0.20
CHAPTER 3/PROBABILITY
27
3.54
Pr (father current smoker ๏ผ mother not current smoker ) ๏ฝ Pr ( father current smoker) ๏ฝ 0.5
This is a conditional probability compared with the joint probability in Problem 3.53.
3.55
Pr (father current smoker ๏ mother not current smoker) ๏ฝ Pr (father current smoker)
๏ด Pr (mother not current smoker ๏ผ father current smoker) ๏ฝ 0.5 ๏ด ( 1๏ญ 0.6 ) ๏ฝ 0.20
3.56
The smoking habits of the parents are not independent random variables because Pr (mother current
smoker ๏ผ father current smoker) ๏ฝ 0.6 ๏น Pr (mother current smoker ๏ผ father not current smoker) ๏ฝ 0.2
3.57
Let A ๏ฝ {child has asthma}, M ๏ฝ {mother current smoker},
M = {mother not current smoker}, F ๏ฝ {father current smoker} ,
F ๏ฝ {father not current smoker}. We want Pr ๏จ A ๏ฉ. We have that
๏จ ๏ฉ
๏จ
๏ฉ ๏จ
๏ฉ ๏จ
๏ฉ ๏จ
๏ฉ
๏ซPr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ ๏ซ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ
Pr A ๏ฝ Pr A | M ๏ F ๏ด Pr M ๏ F ๏ซ Pr A | M ๏ F ๏ด Pr M ๏ F
We are given that
๏จ
๏ฉ
๏จ
๏ฉ
Pr ๏จ A | M ๏ F ๏ฉ ๏ฝ 0.05, Pr ๏จ A | M ๏ F ๏ฉ ๏ฝ 0.04
Pr A | M ๏ F ๏ฝ 0.15, Pr A | M ๏ F ๏ฝ 0.13
Also,
๏จ
๏ฉ ๏จ ๏ฉ ๏จ ๏ฉ
Pr ๏จ M ๏ F ๏ฉ ๏ฝ Pr ๏จ F ๏ฉ ๏ด Pr ๏จ M | F ๏ฉ ๏ฝ 0.5๏ด 0.2 ๏ฝ 0.10
Pr ๏จ M ๏ F ๏ฉ ๏ฝ Pr ๏จ F ๏ฉ ๏ด Pr ๏จ M | F ๏ฉ ๏ฝ 0.5๏ด 0.4 ๏ฝ 0.20
Pr ๏จ M ๏ F ๏ฉ ๏ฝ Pr ๏จ F ๏ฉ ๏ด Pr ๏จ M | F ๏ฉ ๏ฝ 0.5๏ด 0.8 ๏ฝ 0.40
Pr M ๏ F ๏ฝ Pr F ๏ฝ Pr M | F ๏ฝ 0.5๏ด 0.6 ๏ฝ 0.30
Therefore,
Pr ๏จ A๏ฉ ๏ฝ.15๏ด.30 ๏ซ.13๏ด.10 ๏ซ.05๏ด.20 ๏ซ.04 ๏ด.40 ๏ฝ.084
3.58
We want to compute Pr ๏จ F A๏ฉ. We have from the definition of conditional probability that
Pr( F A) ๏ฝ
Pr ๏จ F ๏ A ๏ฉ Pr ๏จ F ๏ A ๏ฉ
๏ฝ
Pr ๏จ A๏ฉ
.084
Furthermore,
๏จ
๏ฉ
๏จ
๏ฉ ๏จ
๏ฉ
๏ฝ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ ๏ซ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ
Pr F ๏ A ๏ฝ Pr M ๏ F ๏ A ๏ซ Pr M ๏ F ๏ A
Referring to problem 3.61, we note that Pr ๏จ F ๏ A๏ฉ ๏ฝ.15๏ด.30 ๏ซ.05๏ด.20 ๏ฝ.055
Thus, Pr ๏จ F A๏ฉ ๏ฝ
.055
๏ฝ.655
.084
28
3.59
CHAPTER 3/PROBABILITY
We want to compute Pr ๏จ M A ๏ฉ. We have that Pr ๏จ M A๏ฉ ๏ฝ
๏จ
๏ฉ
Pr ๏จ M ๏ A๏ฉ
where
Pr ๏จ A๏ฉ
๏จ
๏ฉ ๏จ
๏ฉ
๏ฝ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ ๏ซ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ
Pr M ๏ A ๏ฝ Pr M ๏ F ๏ A ๏ซ Pr M ๏ F ๏ A
๏ฝ 0.15๏ด 0.30 ๏ซ 0.13๏ด 0.10 ๏ฝ 0.058
Thus, Pr ๏จ M A๏ฉ ๏ฝ
3.60
.058
๏ฝ.690
.084
๏จ
๏ฉ
We want to compute Pr F | A . We have that
๏จ
๏ฉ
Pr F | A ๏ฝ
๏จ
Pr F ๏ A
๏จ ๏ฉ
๏ฉ
Pr A
where
๏จ
๏ฉ
๏จ
๏ฉ ๏จ
๏ฉ
๏ฝ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ ๏ซ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ
๏ฝ ๏จ1๏ญ 0.15๏ฉ ๏ด 0.30 ๏ซ ๏จ1๏ญ 0.05๏ฉ ๏ด 0.20 ๏ฝ 0.445
Pr F ๏ A ๏ฝ Pr M ๏ F ๏ A ๏ซ Pr M ๏ F ๏ A
๏จ
๏ฉ
Thus, Pr F | A ๏ฝ
3.61
๏จ
Pr F ๏ A
๏จ ๏ฉ
Pr A
๏ฉ ๏ฝ 0.455 ๏ฝ 0.486
0.916
๏จ
๏ฉ
๏จ
๏ฉ
We want to compute Pr M | A . We have Pr M | A ๏ฝ
๏จ
Pr M ๏ A
๏จ ๏ฉ
๏ฉ
Pr A
where
๏จ
๏ฉ
๏จ
๏ฉ ๏จ
๏ฉ
๏ฝ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ ๏ซ Pr ๏จ A | M ๏ F ๏ฉ ๏ด Pr ๏จ M ๏ F ๏ฉ
๏ฝ ๏จ1๏ญ 0.15๏ฉ ๏ด 0.30 ๏ซ ๏จ1๏ญ 0.13๏ฉ ๏ด 0.10 ๏ฝ 0.342
Pr M ๏ A ๏ฝ Pr M ๏ F ๏ A ๏ซ Pr M ๏ F ๏ A
๏จ
๏ฉ
Thus, Pr M | A ๏ฝ
3.62
๏จ
Pr M ๏ A
๏จ ๏ฉ
Pr A
๏ฉ ๏ฝ 0.342 ๏ฝ 0.373
0.916
๏จ
๏ฉ
We found in problem 3.58 that Pr ๏จ F A ๏ฉ ๏ฝ.655 and in problem 3.60 that Pr F | A ๏ฝ 486. Since,
๏จ
๏ฉ
๏จ
๏ฉ
Pr F | A ๏น Pr F | A , the fatherโs smoking status and the childโs asthma status are not independent.
3.63
๏จ
๏ฉ
We found in problem 3.59 that Pr ๏จ M A ๏ฉ ๏ฝ.690 and in problem 3.61 that Pr M | A ๏ฝ 0.373. Since
๏จ
๏ฉ
๏จ
๏ฉ
Pr M | A ๏ฝ 0.690 ๏น Pr M | A ๏ฝ 0.373, the motherโs smoking status and the childโs asthma status are not
independent.
Document Preview (30 of 455 Pages)
User generated content is uploaded by users for the purposes of learning and should be used following SchloarOn's honor code & terms of service.
You are viewing preview pages of the document. Purchase to get full access instantly.
-37%
Solution Manual for Fundamentals of Biostatistics, 8th Edition
$18.99 $29.99Save:$11.00(37%)
24/7 Live Chat
Instant Download
100% Confidential
Store
Henry Lewis
0 (0 Reviews)
Best Selling
The World Of Customer Service, 3rd Edition Test Bank
$18.99 $29.99Save:$11.00(37%)
Chemistry: Principles And Reactions, 7th Edition Test Bank
$18.99 $29.99Save:$11.00(37%)
Test Bank for Hospitality Facilities Management and Design, 4th Edition
$18.99 $29.99Save:$11.00(37%)
Solution Manual for Designing the User Interface: Strategies for Effective Human-Computer Interaction, 6th Edition
$18.99 $29.99Save:$11.00(37%)
Data Structures and Other Objects Using C++ 4th Edition Solution Manual
$18.99 $29.99Save:$11.00(37%)
2023-2024 ATI Pediatrics Proctored Exam with Answers (139 Solved Questions)
$18.99 $29.99Save:$11.00(37%)