Solution Manual for Fundamentals of Biostatistics, 8th Edition

Preview Extract
Complete Solutions Manual to Accompany Fundamentals of Biostatistics ยฉ Cengage Learning. All rights reserved. No distribution allowed without express authorization. EIGHTH EDITION Bernard Rosner Harvard University, Cambridge, MA Prepared by Roland A. Matsouaka Duke University, Durham, NC Australia โ€ข Brazil โ€ข Mexico โ€ข Singapore โ€ข United Kingdom โ€ข United States ISBN-13: 978-1-305-26905-7 ISBN-10: 1-305-26905-5 ยฉ 2016 Cengage Learning ALL RIGHTS RESERVED. No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher except as may be permitted by the license terms below. For product information and technology assistance, contact us at Cengage Learning Customer & Sales Support, 1-800-354-9706. For permission to use material from this text or product, submit all requests online at www.cengage.com/permissions Further permissions questions can be emailed to [email protected]. Cengage Learning 20 Channel Center Street, 4th Floor Boston, MA 02210 USA Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan. Locate your local office at: www.cengage.com/global. Cengage Learning products are represented in Canada by Nelson Education, Ltd. To learn more about Cengage Learning Solutions, visit www.cengage.com. Purchase any of our products at your local college store or at our preferred online store www.cengagebrain.com. NOTE: UNDER NO CIRCUMSTANCES MAY THIS MATERIAL OR ANY PORTION THEREOF BE SOLD, LICENSED, AUCTIONED, OR OTHERWISE REDISTRIBUTED EXCEPT AS MAY BE PERMITTED BY THE LICENSE TERMS HEREIN. READ IMPORTANT LICENSE INFORMATION Dear Professor or Other Supplement Recipient: Cengage Learning has provided you with this product (the โ€œSupplementโ€) for your review and, to the extent that you adopt the associated textbook for use in connection with your course (the โ€œCourseโ€), you and your students who purchase the textbook may use the Supplement as described below. Cengage Learning has established these use limitations in response to concerns raised by authors, professors, and other users regarding the pedagogical problems stemming from unlimited distribution of Supplements. Cengage Learning hereby grants you a nontransferable license to use the Supplement in connection with the Course, subject to the following conditions. The Supplement is for your personal, noncommercial use only and may not be reproduced, posted electronically or distributed, except that portions of the Supplement may be provided to your students IN PRINT FORM ONLY in connection with your instruction of the Course, so long as such students are advised that they Printed in the United States of America 1 2 3 4 5 6 7 17 16 15 14 13 may not copy or distribute any portion of the Supplement to any third party. You may not sell, license, auction, or otherwise redistribute the Supplement in any form. We ask that you take reasonable steps to protect the Supplement from unauthorized use, reproduction, or distribution. Your use of the Supplement indicates your acceptance of the conditions set forth in this Agreement. If you do not accept these conditions, you must return the Supplement unused within 30 days of receipt. All rights (including without limitation, copyrights, patents, and trade secrets) in the Supplement are and will remain the sole and exclusive property of Cengage Learning and/or its licensors. The Supplement is furnished by Cengage Learning on an โ€œas isโ€ basis without any warranties, express or implied. This Agreement will be governed by and construed pursuant to the laws of the State of New York, without regard to such Stateโ€™s conflict of law rules. Thank you for your assistance in helping to safeguard the integrity of the content contained in this Supplement. We trust you find the Supplement a useful teaching tool. Contents Chapter 2 Descriptive Statistics …………………………………………………………………………………………. 2 Chapter 3 Probability ……………………………………………………………………………………………………… 21 Chapter 4 Discrete Probability Distributions ……………………………………………………………………… 43 Chapter 5 Continuous Probability Distributions …………………………………………………………………. 65 Chapter 6 Estimation………………………………………………………………………………………………………. 93 Chapter 7 Hypothesis Testing: One-Sample Inference ……………………………………………………… 119 Chapter 8 Hypothesis Testing: Two-Sample Inference …………………………………………………….. 146 Chapter 9 Nonparametric Methods …………………………………………………………………………………. 192 Chapter 10 Hypothesis Testing: Categorical Data …………………………………………………………….. 216 Chapter 11 Regression and Correlation Methods ……………………………………………………………… 267 Chapter 12 Multisample Inference ………………………………………………………………………………….. 322 Chapter 13 Design and Analysis Techniques for Epidemiologic Studies ……………………………… 358 Chapter 14 Hypothesis Testing: Person-Time Data ………………………………………………………….. 413 DESCRIPTIVE STATISTICS 2.1 We have x๏€ฝ median ๏€ฝ 2.2 ๏ƒฅ xi ๏€ฝ 215 ๏€ฝ 8.6 days 25 n ๏€จ n ๏€ซ 1๏€ฉ 2 th largest observation = 13th largest observation = 8 days We have that 25 ๏ƒฅ๏€จ x ๏€ญ x ๏€ฉ ๏€จ 5 ๏€ญ 8.6๏€ฉ ๏€ซ๏Œ๏€ซ ๏€จ 4 ๏€ญ 8.6๏€ฉ 784 s ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ 32.67 2 2 i 2 2 i๏€ฝ1 24 24 24 s ๏€ฝ standard deviation = variance ๏€ฝ 5.72 days range ๏€ฝ largest ๏€ญ smallest observation ๏€ฝ 30 ๏€ญ 3 ๏€ฝ 27 days 2.3 Suppose we divide the patients according to whether or not they received antibiotics, and calculate the mean and standard deviation for each of the two subsamples: x s n Antibiotics 11.57 8.81 7 No antibiotics 7.44 3.70 18 Antibiotics – x7 8.50 3.73 6 It appears that antibiotic users stay longer in the hospital. Note that when we remove observation 7, the two standard deviations are in substantial agreement, and the difference in the means is not that impressive anymore. This example shows that x and s2 are not robust; that is, their values are easily affected by outliers, particularly in small samples. Therefore, we would not conclude that hospital stay is different for antibiotic users vs. non-antibiotic users. 2 CHAPTER 2/DESCRIPTIVE STATISTICS 2.4-2.7 3 Changing the scale by a factor c will multiply each data value xi by c, changing it to cxi . Again the same individualโ€™s value will be at the median and the same individualโ€™s value will be at the mode, but these values will be multiplied by c. The geometric mean will be multiplied by c also, as can easily be shown: Geometric mean ๏€ฝ [( cx1 )( cx2 )๏Œ ( cxn )]1/ n ๏€ฝ ( cn x1 ๏ƒ— x2 ๏Œ xn )1/ n ๏€ฝ c ( x1 ๏ƒ— x2 ๏Œ xn )1/ n ๏€ฝ c ๏‚ด old geometric mean The range will also be multiplied by c. For example, if c ๏€ฝ 2 we have: xi โ€“3 โ€“2 โ€“1 0 1 2 3 Original Scale xi โ€“6 2.8 โ€“4 โ€“2 0 2 4 6 Scale 2 We first read the data file โ€œrunning timeโ€ in R > require(xlsx) > running head(running) week time 1 1 12.80 2 2 12.20 3 3 12.25 4 4 12.18 5 5 11.53 6 6 12.47 The mean 1-mile running time over 18 weeks is equal to 12.09 minutes: > mean(running$time) [1] 12.08889 2.9 The standard deviation is given by > sd(running$time) [1] 0.3874181 2.10 Let us first create the variable โ€œtime_100โ€ and then calculate its mean and standard deviation > running$time_100=100*running$time > mean(running$time_100) [1] 1208.889 > sd(running$time_100) [1] 38.74181 2.11 Let us to construct the stem-and-leaf plot in R using the stem.leaf command from the package โ€œaplpackโ€ > require(aplpack) CHAPTER 2/DESCRIPTIVE STATISTICS 4 > stem.leaf(running$time_100, unit=1, trim.outliers=FALSE) 1 | 2: represents 12 leaf unit: 1 n: 18 2 115 | 37 3 116 | 7 5 117 | 23 7 118 | 03 8 119 | 2 (1) 120 | 8 9 121 | 8 8 122 | 05 6 123 | 03 4 124 | 7 3 125 | 5 2 126 | 7 127 | 1 128 | 0 Note: one can also use the standard command stem (which does require the โ€œaplpackโ€ package) to get a similar plot > stem(running$time_100, scale = 4) Box plot of running times 2.12 The quantiles of the running times are 12.8 > quantile(running$time) 0% 25% 50% 75% 100% 11.5300 11.7475 12.1300 12.3225 12.8000 12.4 Time An outlying value is identify has any value x such that x ๏€พ upper quartile+1.5 ๏‚ด (upper quartile-lower quartile) ๏€ฝ 12.32 ๏€ซ1.5 ๏‚ด (12.32 ๏€ญ11.75) ๏€ฝ 12.32 ๏€ซ 0.85 ๏€ฝ 13.17 12.6 12.2 12.0 11.8 Since 12.97 minutes is smaller than the largest nonoutlying value (13.17 minutes), this running time recorded in his first week of running in the spring is not an outlying value relative to the distribution of running times recorded the previous year. 2.13 The mean is x๏€ฝ 2.14 11.6 ๏ƒฅ xi ๏€ฝ 469 ๏€ฝ 19.54 mg dL 24 24 We have that 24 ๏ƒฅ(x ๏€ญ x ) 2 2 (49 ๏€ญ19.54)2 ๏€ซ๏Œ๏€ซ (12 ๏€ญ19.54) 6495.96 ๏€ฝ ๏€ฝ 282.43 23 23 23 s ๏€ฝ 282.43 ๏€ฝ 16.81 mg/dL i s ๏€ฝ 2 2.15 i๏€ฝ1 ๏€ฝ We provide two rows for each stem corresponding to leaves 5-9 and 0-4 respectively. We have CHAPTER 2/DESCRIPTIVE STATISTICS Stem-andleaf plot +4 98 +4 1 +3 65 +3 21 +2 78 +2 13 +1 9699 +1 332 +0 88 +0 2 ๏€ญ0 8 ๏€ญ0 03 ๏€ญ1 5 Cumulative frequency 24 22 21 19 17 15 13 9 6 4 3 2 2.16 We wish to compute the average of the (24/2)th and (24/2 + 1)th largest values ๏€ฝ average of the 12th and 13th largest points. We note from the stem-and-leaf plot that the 13th largest point counting from the bottom is the largest value in the upper ๏€ซ1 row ๏€ฝ 19. The 12th largest point ๏€ฝ the next largest value in this 19 ๏€ซ 19 ๏€ฝ 19 mg dL . row ๏€ฝ 19. Thus, the median ๏€ฝ 2 2.17 We first must compute the upper and lower quartiles. Because 24๏€จ75 100๏€ฉ ๏€ฝ 18 is an integer, the upper 32 ๏€ซ 31 ๏€ฝ 31. 5 . Similarly, because quartile = average of the 18th and 19th largest values ๏€ฝ 2 24๏€จ25 100๏€ฉ ๏€ฝ 6 is an integer, the lower quartile ๏€ฝ average of the 6th and 7th smallest 8 ๏€ซ 12 points ๏€ฝ ๏€ฝ 10 . 2 Second, we identify outlying values. An outlying value is identified as any value x such that x ๏€พ upper quartile ๏€ซ 1. 5 ๏‚ด ( upper quartile ๏€ญ lower quartile) ๏€ฝ 31. 5 ๏€ซ 1. 5 ๏‚ด (31. 5 ๏€ญ 10) ๏€ฝ 31. 5 ๏€ซ 32.25 ๏€ฝ 63.75 or x ๏€ผ lower quartile ๏€ญ 15 . ๏‚ด ( upper quartile ๏€ญ lower quartile) ๏€ฝ 10 ๏€ญ 1.5 ๏‚ด (315 . ๏€ญ 10) ๏€ฝ 10 ๏€ญ 32.25 ๏€ฝ ๏€ญ22.25 From the stem-and-leaf plot, we note that the range is from ๏€ญ13 to ๏€ซ49. Therefore, there are no outlying values. Thus, the box plot is as follows: Stem-andleaf plot +4 98 +4 1 +3 65 +3 21 +2 78 +2 13 +1 9699 +1 332 +0 88 +0 2 ๏€ญ0 ๏€ญ0 8 ๏€ญ1 03 Cumulative frequency 24 22 21 19 17 15 13 9 6 4 3 2 Box plot | | | ๏€ซ๏€ญ๏€ญ๏€ญ๏€ญ๏€ญ๏€ซ | | | | ๏€ช๏€ญ๏€ญ๏€ซ๏€ญ๏€ญ๏€ช ๏€ซ๏€ญ๏€ญ๏€ญ๏€ญ๏€ญ๏€ซ | | | | | CHAPTER 2/DESCRIPTIVE STATISTICS 6 Comments: The distribution is reasonably symmetric, since the mean ๏€ฝ 19.54 mg dL ๏€ฝ๏€ฆ 19 mg dL ๏€ฝ median. This is also manifested by the percentiles of the distribution since the upper quartile ๏€ญ median ๏€ฝ 31. 5 ๏€ญ 19 ๏€ฝ 12. 5 ๏€ฝ๏€ฆ median ๏€ญ lower quartile ๏€ฝ 19 ๏€ญ 10 ๏€ฝ 9 . The box plot looks deceptively asymmetric, since 19 is the highest value in the upper ๏€ซ1 row and 10 is the lowest value in the lower ๏€ซ1 row. 2.18 To compute the median cholesterol level, we construct a stem-and-leaf plot of the before-cholesterol measurements as follows. Stem-andleaf plot 25 0 24 4 23 68 22 42 21 20 5 19 5277 18 0 17 8 16 698871 15 981 14 5 13 7 Cumulative frequency 24 23 22 20 18 17 13 12 11 5 2 1 Based on the cumulative frequency column, we see that the median ๏€ฝ average of the 12th and 13th largest 178 ๏€ซ 180 ๏€ฝ 179 mg/dL. Therefore, we look at the change scores among persons with baseline values ๏€ฝ 2 cholesterol ๏‚ณ 179 mg/dL and < 179 mg/dL, respectively. A stem-and-leaf plot of the change scores in these two groups is given as follows: Baseline ๏‚ณ 179 mg/dL Stem-andleaf plot +4 98 +4 +3 65 +3 2 +2 78 +2 1 +1 699 +1 +0 8 +0 ๏€ญ0 ๏€ญ0 ๏€ญ1 Baseline < 179 mg/dL Stem-andleaf plot +4 +4 1 +3 +3 1 +2 +2 3 +1 9 +1 332 +0 8 +0 2 ๏€ญ0 8 ๏€ญ0 03 ๏€ญ1 Clearly, from the plot, the effect of diet on cholesterol is much greater among individuals who start with relatively high cholesterol levels ( ๏‚ณ 179 mg/dL) versus those who start with relatively low levels (< 179 mg/dL). This is also evidenced by the mean change in cholesterol levels in the two groups, which is 28.2 mg/dL in the ๏‚ณ 179 mg/dL group and 10.9 mg/dL in the < 179 mg/dL group. We will be discussing the formal statistical methods for comparing mean changes in two groups in our work on twosample inference in Chapter 8. CHAPTER 2/DESCRIPTIVE STATISTICS 2.19 7 We first calculate the difference scores between the two positions: Subject number Subject B.R.A. J.A.B. F.L.B. V.P.B. M.F.B. E.H.B. G.C. M.M.C. T.J.F. R.R.F. Systolic difference score ๏€ ๏€ญ6 +2 +6 +8 +8 +12 +10 0 ๏€ ๏€ญ2 +4 Diastolic difference score ๏€ ๏€ญ8 ๏€ ๏€ญ2 +4 ๏€ ๏€ญ4 +2 +4 0 ๏€ ๏€ญ2 ๏€ ๏€ญ8 ๏€ ๏€ญ2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 C.R.F. E.W.G. T.F.H. E.J.H. H.B.H. R.T.K. W.E.L. R.L.L. H.S.M. V.J.M. +8 +14 +2 +6 +26 +8 +10 +12 +14 ๏€ ๏€ญ8 ๏€ ๏€ญ2 +4 ๏€ญ14 ๏€ ๏€ญ2 0 +8 +4 +2 +8 ๏€ ๏€ญ2 21 22 23 24 25 26 27 28 29 30 31 32 R.H.P. R.C.R. J.A.R. A.K.R. T.H.S. O.E.S. R.E.S. E.C.T. J.H.T. F.P.V. P.F.W. W.J.W. +10 +14 +14 +4 +6 +16 +28 +18 +14 +4 +12 +8 +14 +4 0 +4 +4 +2 +16 ๏€ ๏€ญ4 +4 ๏€ ๏€ญ6 +6 ๏€ ๏€ญ4 Second, we calculate the mean difference scores: ๏€ญ6 ๏€ซ ๏‹ ๏€ซ 8 282 ๏€ฝ ๏€ฝ 8.8 mm Hg xsys ๏€ฝ 32 32 ๏€ญ8 ๏€ซ ๏‹ ๏€ซ ๏€จ ๏€ญ4 ๏€ฉ 30 xdias ๏€ฝ ๏€ฝ ๏€ฝ 0.9 mm Hg 32 32 The median difference scores are given by the average of the 16th and 17th largest values. Thus, 8๏€ซ8 mediansys ๏€ฝ ๏€ฝ 8 mm Hg 2 0๏€ซ2 median dias ๏€ฝ ๏€ฝ 1 mm Hg 2 CHAPTER 2/DESCRIPTIVE STATISTICS 2.20 8 The stem-and-leaf and box plots allowing two rows for each stem are given as follows: Systolic Blood Pressure Stem-andleaf plot 2 68 2 1 68 1 20402404442 0 68886868 0 204244 ๏€ญ0 2 ๏€ญ0 68 Cumulative frequency 32 30 28 17 9 3 2 Box plot | | | ๏€ซ๏€ญ๏€ญ๏€ญ๏€ญ๏€ญ๏€ซ ๏€ช๏€ญ๏€ญ๏€ซ๏€ญ๏€ญ๏€ช ๏€ซ๏€ญ๏€ญ๏€ญ๏€ญ๏€ญ๏€ซ | | 14 ๏€ซ 14 4๏€ซ4 ๏€ฝ 14, lower quartile ๏€ฝ ๏€ฝ 4, outlying values: 2 2 x ๏€พ 14 ๏€ซ 15 . ๏‚ด (14 ๏€ญ 4) ๏€ฝ 29 or x ๏€ผ 4 ๏€ญ 15 . ๏‚ด (14 ๏€ญ 4) ๏€ฝ ๏€ญ11. Since the range of values is from โ€“8 to +28, there are no outlying values for systolic blood pressure. Median ๏€ฝ 8, upper quartile ๏€ฝ Diastolic Blood Pressure Stem-andleaf plot 1 6 1 4 0 886 0 42404042404424 ๏€ญ0 242222244 ๏€ญ0 886 ๏€ญ1 4 Cumulative frequency 32 31 30 27 13 4 1 Box plot 0 0 | ๏€ซ๏€ญ๏€ญ+๏€ญ๏€ญ๏€ซ ๏€ซ๏€ญ๏€ญ๏€ญ๏€ญ๏€ญ๏€ซ | 0 ๏€ญ2 ๏€ญ 2 4๏€ซ4 ๏€ฝ 4, lower quartile ๏€ฝ ๏€ฝ ๏€ญ2, outlying values: 2 2 x ๏€พ 4 ๏€ซ 1. 5 ๏‚ด (4 ๏€ซ 2) ๏€ฝ 13.0 or x ๏€ผ ๏€ญ2 ๏€ญ 1. 5 ๏‚ด (4 ๏€ซ 2) ๏€ฝ ๏€ญ11.0 . The values +16, +14 and โ€“14 are outlying values. Median ๏€ฝ 1, upper quartile ๏€ฝ 2.21 Systolic blood pressure clearly seems to be higher in the supine (recumbent) position than in the standing position. Diastolic blood pressure appears to be comparable in the two positions. The distributions are each reasonably symmetric. 2.22 The upper and lower deciles for postural change in systolic blood pressure (SBP) are 14 and 0. Thus, the normal range for postural change in SBP is 0 ๏‚ฃ x ๏‚ฃ 14 . The upper and lower deciles for postural change in diastolic blood pressure (DBP) are 8 and โ€“6. Thus, the normal range for postural change in DBP is ๏€ญ6 ๏‚ฃ x ๏‚ฃ 8 . 2.23 Id 301 451 …… 61951 63241 71141 71142 73041 73042 73751 Age 9 8 FEV 1.708 1.724 Hgt 57 67.5 Sex 0 0 Smoke 0 0 15 16 17 16 16 15 18 2.278 4.504 5.638 4.872 4.27 3.727 2.853 60 72 70 72 67 68 60 0 1 1 1 1 1 0 1 0 0 1 1 1 0 CHAPTER 2/DESCRIPTIVE STATISTICS 75852 77151 MEAN MEDIAN SD 9 16 15 2.795 3.211 63 66.5 0 0 1 0 9.931193 10 2.953935 2.63678 2.5475 0.867059 61.14358 61.5 5.703513 0.513761 0.099388 Histogram of Age Boxplot of FEV 6 90 80 5 60 4 50 FEV Frequency 70 40 3 30 20 2 10 0 3 6 9 12 15 1 18 Age Boxplot of Hgt Chart of Sex Chart of Smoke 350 75 600 300 70 500 250 400 60 200 Count Count Hgt 65 150 55 300 200 100 100 50 50 0 0 Sex 45 2.24 1 Age 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Mean 1.0720 1.316 1.3599 1.6477 1.8330 2.1490 2.3753 2.6814 2.8482 2.9481 3.0656 2.962 2.761 3.058 3.5000 2.9470 3.4320 StDev * 0.290 0.2513 0.2182 0.3136 0.4046 0.4407 0.4304 0.4293 0.3679 0.4321 0.383 0.415 0.397 * 0.1199 0.1230 Minimum 1.0720 0.839 0.7910 1.3380 1.3700 1.2920 1.5910 1.4580 2.0810 2.3470 2.2160 2.236 2.198 2.608 3.5000 2.8530 3.3450 Median 1.0720 1.404 1.3715 1.6720 1.7420 2.1900 2.3810 2.6895 2.8220 2.8890 3.1135 2.997 2.783 2.942 3.5000 2.9060 3.4320 Maximum 1.0720 1.577 1.7040 2.1020 2.5640 2.9930 3.2230 3.4130 3.7740 3.8350 3.8160 3.428 3.330 3.674 3.5000 3.0820 3.5190 StDev * 0.524 0.2336 0.2304 Minimum 1.4040 0.796 1.3590 1.3380 Median 1.4040 1.004 1.7920 1.6580 Maximum 1.4040 1.789 2.1150 2.2620 Results for Sex = 1 Variable FEV Age 3 4 5 6 Mean 1.4040 1.196 1.7447 1.6650 0 1 Smoke Results for Sex = 0 Variable FEV 0 CHAPTER 2/DESCRIPTIVE STATISTICS 7 8 9 10 11 12 13 14 15 16 17 18 19 1.9117 2.0756 2.4822 2.6965 3.2304 3.509 4.011 3.931 4.289 4.193 4.410 4.2367 5.1020 0.3594 0.3767 0.5086 0.6020 0.6459 0.871 0.690 0.635 0.644 0.437 1.006 0.1597 * 10 1.1650 1.4290 1.5580 1.6650 1.6940 1.916 2.531 2.276 3.727 3.645 3.082 4.0860 5.1020 1.9050 2.0690 2.4570 2.6080 3.2060 3.530 4.045 3.882 4.279 4.270 4.429 4.2200 5.1020 2.5780 2.9270 3.8420 4.5910 4.6370 5.224 5.083 4.842 5.793 4.872 5.638 4.4040 5.1020 —————————————————————————————————————————– Results for Sex = 0 Hgt 46.0 46.5 48.0 49.0 50.0 51.0 51.5 52.0 52.5 53.0 53.5 54.0 54.5 55.0 55.5 56.0 56.5 57.0 57.5 58.0 58.5 59.0 59.5 60.0 60.5 61.0 61.5 62.0 62.5 63.0 63.5 64.0 64.5 65.0 65.4 65.5 66.0 66.5 67.0 67.5 68.0 68.5 69.5 71.0 Mean 1.0720 1.1960 1.110 1.4193 1.3378 1.5800 1.474 1.389 1.577 1.6887 1.4150 1.6408 1.7483 1.6313 2.036 1.651 1.7875 1.9037 1.9300 2.1934 1.9440 2.1996 2.517 2.5659 2.5563 2.6981 2.626 2.7861 2.7777 2.7266 2.995 2.9731 2.864 3.090 2.4340 3.154 2.984 3.2843 3.167 2.922 3.214 3.3300 3.8350 2.5380 Age, 0 Age, 1 6 4 2 FEV Variable FEV Scatterplot of FEV vs Age, Hgt 5 10 15 20 5 Hgt, 0 6 10 15 20 Hgt, 1 4 2 50 Panel variable: Sex 60 70 50 60 70 CHAPTER 2/DESCRIPTIVE STATISTICS Results for Sex = 1 Variable FEV Hgt 47.0 48.0 49.5 50.0 50.5 51.0 51.5 52.0 52.5 53.0 53.5 54.0 54.5 55.0 55.5 56.0 56.5 57.0 57.5 58.0 58.5 59.0 59.5 60.0 60.5 61.0 61.5 62.0 62.5 63.0 63.5 64.0 64.5 65.0 65.5 66.0 66.5 67.0 67.5 68.0 68.5 69.0 69.5 70.0 70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 Mean 0.981 1.270 1.4250 1.794 1.536 1.683 1.514 1.5915 1.7100 1.6646 1.974 1.7809 1.8380 1.8034 1.8070 2.025 1.879 2.0875 1.829 2.0169 2.131 2.350 2.515 2.279 2.3253 2.4699 2.5410 2.658 2.829 2.877 2.757 2.697 3.100 2.770 3.0343 3.115 3.353 3.779 3.612 3.878 3.872 4.022 3.743 4.197 3.931 4.310 4.7200 4.361 4.2720 5.255 3.6450 4.654 ——————————————————————————————————————————— Descriptive Statistics: FEV Results for Sex = 0 11 CHAPTER 2/DESCRIPTIVE STATISTICS 12 Boxplot of FEV 6 Smoke 0 1 Mean 2.3792 2.9659 5 StDev 0.6393 0.4229 4 FEV Variable FEV Sex 0 1 3 2 Results for Sex = 1 1 Variable FEV 2.25 Smoke 0 1 Mean 2.7344 3.743 Smoke Sex StDev 0.9741 0.889 0 1 0 0 1 1 Looking at the scatterplot of FEV vs. Age, we find that FEV increases with age for both boys and girls, at approximately the same rate. However, the spread (standard deviation) of FEV values appears to be higher in male group than in the female group. Boxplot of Calories 2.26 3000 Mean 14.557 7.898 64.238 15.21 2.470 8.951 1619.9 1371.7 StDev 7.536 9.695 9.894 27.00 6.314 12.255 323.4 482.1 Median 12.000 3.159 63.500 1.00 0.000 4.550 1606.0 1297.6 2500 2000 Data Variable Sat. Fat – DR Sat. Fat – FFQ Tot. Fat – DR Tot. Fat – FFQ Alcohol – DR Alcohol – FFQ Calories – DR Calories – FFQ 1500 1000 500 Calories – DR Boxplot of Sat. Fat, Tot. Fat, and Alcohol 140 120 100 Data 80 60 40 20 0 Sat. Fat – DR 2.27 Sat. Fat – FFQ Tot. Fat – DR Tot. Fat – FFQ Alcohol – DR Alcohol – FFQ Calories – FFQ CHAPTER 2/DESCRIPTIVE STATISTICS 13 Scatterplot of DR vs. FFQ values Sat. Fat – DR*Sat. Fat – FFQ 50 Tot. Fat – DR*Tot. Fat – FFQ 120 40 100 30 80 60 20 40 10 0 15 30 45 60 0 Alcohol – DR*Alcohol – FFQ 50 100 150 Calories – DR*Calories – FFQ 2500 48 36 2000 24 1500 12 1000 0 0 15 30 45 60 1000 2000 3000 If FFQ were a perfect substitute for DR, the points would line up in a straight line. If the two were unrelated, then we would expect to see a random pattern in each panel. The scatterplots shown above seem to suggest that the DR and FFQ values are not highly related. 2.28 The 5×5 tables below show the number of people classified into a particular combination of quintile categories. For each table, the rows represent the quintiles of the DR, and the columns represent quintiles of the FFQ. Overall, we get the same impression that there is weak concordance between the two measures. However, we do notice that the agreement is greatest for the two measures with regards to alcohol consumption. Also, we note the relatively high level of agreement at the extremes of each nutrient; for example, the (1,1) and (5,5) cells generally contain the highest values. Tabulated statistics: SFDQuin, SFFQuin Rows: SFDQuin 1 2 3 4 5 All Columns: SFFQuin 1 2 3 4 5 All 15 10 4 6 0 35 8 6 7 10 3 34 9 6 8 6 6 35 2 8 9 9 7 35 1 5 6 4 18 34 35 35 34 35 34 173 Cell Contents: Count Tabulated statistics: TFDQuin, TFFQuin Rows: TFDQuin 1 2 3 4 5 Columns: TFFQuin 1 2 3 4 5 All 13 9 4 8 1 9 5 10 6 5 8 7 8 3 8 5 10 6 9 5 1 3 6 9 15 36 34 34 35 34 CHAPTER 2/DESCRIPTIVE STATISTICS All 35 35 34 Cell Contents: 35 34 14 173 Count Tabulated statistics: AlcDQuin, AlcFQuin Rows: AlcDQuin 1 2 3 4 5 All Columns: AlcFQuin 1 2 3 4 5 All 28 6 0 0 0 34 5 23 9 1 0 38 2 6 14 10 0 32 0 0 10 16 8 34 0 0 1 8 26 35 35 35 34 35 34 173 Cell Contents: Count Tabulated statistics: CalDQuin, CalFQuin Rows: CalDQuin 1 2 3 4 5 All Columns: CalFQuin 1 2 3 4 5 All 10 11 5 4 5 35 11 4 9 8 3 35 8 9 6 7 4 34 4 7 8 6 10 35 2 4 6 10 12 34 35 35 34 35 34 173 2.29 Descriptive Statistics: Total Fat Density DR, Total Fat Density FFQ Variable Total Fat Density DR Total Fat Density FFQ Mean 38.066 36.855 StDev 4.205 6.729 Median 38.646 36.366 Scatterplot of Total Fat Density DR vs Total Fat Density FFQ 50 0 Total Fat Density DR 40 30 20 10 0 0 0 10 20 30 40 Total Fat Density FFQ 50 60 CHAPTER 2/DESCRIPTIVE STATISTICS 2.30 15 The concordance for the quintiles of nutrient density does appear somewhat stronger than for the quintiles of raw nutrient data. In the table below, we see that 19+14+10+7+11 = 61 individuals were in the same quintile on both measures, compared to 50 people in the table from question 2.28. Tabulated statistics: Dens DR Quin, Dens FFQ Quin Rows: Dens DR Quin 1 2 3 4 5 All Columns: Dens FFQ Quin 1 2 3 4 5 All 19 5 4 6 1 35 7 14 8 4 2 35 6 5 10 7 6 34 2 6 6 7 14 35 1 5 6 11 11 34 35 35 34 35 34 173 2.31 We find that exposed children (Lead type = 2) are somewhat younger and more likely to be male (Sex = 1), compared to unexposed children. The boxplot below shows all three lead types, but we are only interested in types 1 and 2. Boxplot of Age Variable Age Lead_type 1 2 Mean 893.8 776.3 StDev 360.2 329.5 1600 Median 905.0 753.5 1400 1200 Rows: Lead_type Age Tabulated statistics: Lead_type, Sex Columns: Sex 1000 800 1 2 All 46 58.97 32 41.03 78 100.00 2 17 70.83 7 29.17 24 100.00 2.32 The exposed children have somewhat lower mean and median IQ scores compared to the unexposed children, but the differences donโ€™t appear to be very large. 1 600 400 200 1 Descriptive Statistics: Iqv, Iqp 2 Lead_type 3 Boxplot of Iqv, Iqp 150 Iqp Lead_type 1 2 Mean 85.14 84.33 StDev 14.69 10.55 Median 85.00 81.50 1 2 102.71 95.67 16.79 11.34 101.00 97.00 125 Data Variable Iqv 100 75 50 Lead_type 2.33 1 2 Iqv 3 1 2 Iqp 3 The coefficient of variation (CV) is given by 100% ๏€จ s / x ๏€ฉ , where s and x are computed separately for each subject. We compute x , s , and CV ๏€ฝ 100% ๏‚ด ๏€จ s x ๏€ฉ separately for each subject using the following function in R: CHAPTER 2/DESCRIPTIVE STATISTICS 16 cv_est cv_est(c(2.22, 1.88)) Mean, SD, CV are [1] 2.0500000 0.2404163 11.7276247 The results are shown in the table below: APC resistance Coefficient of Variation Sample number 1 2 3 4 5 6 7 8 9 10 A 2.22 3.42 3.68 2.64 2.68 3.29 3.85 2.24 3.25 3.3 B 1.88 3.59 3.01 2.37 2.26 3.04 3.57 2.29 3.39 3.16 mean 2.05 3.505 3.345 2.505 2.47 3.165 3.71 2.265 3.32 3.23 sd 0.240 0.120 0.474 0.191 0.297 0.177 0.198 0.035 0.099 0.099 CV 11.7 3.4 14.2 7.6 12.0 5.6 5.3 1.6 3.0 3.1 average CV 6.7 2.34 To obtain the average CV, we average the individual-specific CVโ€™s over the 10. The average CV = 6.7% which indicates excellent reproducibility. 2.35 We compute the mean and standard deviation of pod weight for both inoculated (I) and uninoculated (U) plants. The results are given as follows: mean sd n 2.36 I 1.63 0.42 8 U 1.08 0.51 8 We plot the distribution of I and U pod weights using a dot-plot from MINITAB. โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“ โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“ I โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“ โ€“+ โ€“โ€“ โ€“โ€“ โ€“ โ€“โ€“ โ€“โ€“U 0.70 1.05 1.40 1.75 2.10 2.45 2.37 Although there is some overlap in the distributions, it appears that the I plants tend in have higher pod weights than the U plants. We will discuss t tests in Chapter 8 to assess whether there are โ€œstatistically significantโ€ differences in mean pod weights between the 2 groups. CHAPTER 2/DESCRIPTIVE STATISTICS 17 2.38-2.40 For lumbar spine bone mineral density, we have the following: ID A B C PY Diff Pack Year Group 1002501 โ€0.05 0.785 โ€6.36942675 13.75 2 1015401 โ€0.12 0.95 โ€12.6315789 48 5 1027601 โ€0.24 0.63 โ€38.0952381 20.5 3 1034301 0.04 0.83 4.81927711 29.75 3 1121202 โ€0.19 0.685 โ€27.7372263 25 3 1162502 โ€0.03 0.845 โ€3.55029586 5 1 1188701 โ€0.08 0.91 โ€8.79120879 42 5 2 1248202 โ€0.1 0.71 โ€14.084507 15 1268301 0.15 0.905 16.5745856 9.5 1 1269402 โ€0.12 0.95 โ€12.6315789 39 4 1273101 โ€0.1 0.81 โ€12.345679 14.5 2 1323501 0.09 0.755 11.9205298 23.25 3 1337102 โ€0.08 0.67 โ€11.9402985 18.5 2 1467301 โ€0.07 0.665 โ€10.5263158 39 4 1479401 โ€0.03 0.715 โ€4.1958042 25.5 3 1494101 0.05 0.735 6.80272109 8 1 1497701 0.04 0.75 5.33333333 10 2 1505502 โ€0.04 0.81 โ€4.9382716 32 4 1519402 โ€0.01 0.645 โ€1.5503876 13.2 2 1521701 โ€0.06 0.74 โ€8.10810811 30 4 1528201 โ€0.11 0.695 โ€15.8273381 20.25 3 1536201 โ€0.05 0.865 โ€5.78034682 36.25 4 1536701 0.03 0.635 4.72440945 12 2 1541902 โ€0.12 0.98 โ€12.244898 11.25 2 1543602 0.03 0.885 3.38983051 8 1 1596702 0.01 0.955 1.04712042 14 2 1597002 0.07 0.705 9.92907801 17.3 2 1597601 0.13 0.775 16.7741935 12 2 1607901 โ€0.03 0.485 โ€6.18556701 43.2 5 1608801 โ€0.21 0.585 โ€35.8974359 48 5 1628601 โ€0.05 0.795 โ€6.28930818 5.35 1 1635901 0.03 0.945 3.17460317 8 1 1637901 โ€0.05 0.775 โ€6.4516129 6 1 1640701 โ€0.01 0.855 โ€1.16959064 28 3 1643602 0.11 0.555 19.8198198 64.5 5 1647502 โ€0.07 0.545 โ€12.8440367 11.3 2 1648701 โ€0.08 0.94 โ€8.5106383 15.75 2 1657301 โ€0.08 0.72 โ€11.1111111 21 3 1671001 โ€0.07 0.895 โ€7.82122905 39 4 1672702 0.1 0.87 11.4942529 18.75 2 2609801 โ€0.1 0.9 โ€11.1111111 48 5 Mean -4.9496682 Median -6.2893082 Sd 12.4834202 CHAPTER 2/DESCRIPTIVE STATISTICS 18 Individual Value Plot of C Descriptive Statistics: C 10 Mean 1.95 -2.18 -10.17 -8.30 -9.13 StDev 8.26 10.45 16.69 2.89 17.77 Median 3.17 -3.96 -7.65 -7.96 -9.95 0 C Variable C Pack Year Group 1 2 3 4 5 20 -10 -20 -30 -40 1 2 3 Pack Year Group 4 5 It appears that the value of C is generally decreasing as the difference in pack-years gets larger. This suggests that the lumbar spine bone mineral density is smaller in the heavier-smoking twin, which suggests that tobacco use has a negative relationship with bone mineral density. 2.41-2.43 For femoral neck BMD, we find . . . A B C โ€0.04 0.7 โ€5.714285714 โ€0.1 0.69 โ€14.49275362 0.01 0.635 1.57480315 0.05 0.665 7.518796992 โ€0.16 0.62 โ€25.80645161 โ€0.06 0.53 โ€11.32075472 โ€0.05 0.805 โ€6.211180124 โ€0.07 0.525 โ€13.33333333 0.12 0.71 16.90140845 โ€0.03 0.885 โ€3.389830508 Descriptive Statistics: C_Fem Variable C_Fem Pack Year Group 1 2 3 4 5 0.72 5.555555556 0.805 โ€11.18012422 โ€ฆโ€ฆโ€ฆ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. 0.04 0.44 9.090909091 โ€0.05 0.665 โ€7.518796992 โ€0.03 0.635 โ€4.724409449 -10 0.14 0.64 21.875 -20 30 20 C_Fem 10 0 0.12 0.73 16.43835616 -30 โ€0.09 0.765 โ€11.76470588 -40 Median -2.941176471 Sd 14.16185979 Median 7.87 3.68 -4.76 -5.36 -8.99 Individual Value Plot of C_Fem 0.04 -0.466252903 StDev 11.38 14.83 11.44 14.05 16.00 40 โ€0.09 Mean Mean 4.68 4.51 -4.78 -3.56 -9.24 1 2 3 Pack Year Group 4 5 We get the same overall impression as before, that BMD decreases as tobacco use increases. The relationship may be a bit stronger using the femoral neck measurements, as we see a difference of approximately 14 units (4.68 โ€“ (9.24)) in the mean value of C between Pack Year Group 1 (40 py). Using the lumbar spine data, this difference was approximately 11 units. 2.44-2.46 Using femoral shaft BMD, we find the following: CHAPTER 2/DESCRIPTIVE STATISTICS A B C 0.04 1.02 3.921568627 0.12 1.05 11.42857143 โ€0.19 0.955 โ€19.89528796 โ€0.09 1.075 โ€8.372093023 โ€0.18 1.05 โ€17.14285714 โ€0.07 1.095 โ€6.392694064 0.07 1.195 5.857740586 โ€0.01 1.045 โ€0.956937799 0.08 1.11 7.207207207 โ€ฆโ€ฆโ€ฆโ€ฆ.. โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ.. โ€ฆโ€ฆโ€ฆโ€ฆโ€ฆโ€ฆ 19 Descriptive Statistics: C_Shaft Variable C_Shaft โ€8.547008547 โ€7.920792079 10 โ€0.03 0.875 โ€3.428571429 0 โ€0.04 0.68 โ€5.882352941 0.1 1.16 8.620689655 โ€0.2 1.32 โ€15.15151515 โ€0.03 1.045 โ€2.870813397 -30 โ€0.04 1.04 โ€3.846153846 -40 0.06 1.28 4.6875 -2.870813397 Sd 11.29830441 C_Shaft 1.17 1.01 Median StDev 7.67 6.49 9.77 11.03 21.61 Median -2.74 1.03 -9.40 -3.80 0.63 Individual Value Plot of C_Shaft โ€0.1 -3.241805211 Mean -0.98 0.25 -8.55 -1.92 -8.26 20 โ€0.08 Mean Pack Year Group 1 2 3 4 5 -10 -20 -50 1 2 3 Pack Year Group 4 5 When using the femoral shaft BMD data, the relationship between BMD and tobacco is much less clear. The lowest mean (and median) C value occurs in group 3, and it is hard to tell if any relationship exists between pack-year group and C. 2.47 We first read the data set LVM and show its first observations > require(xlsx) >lvm head(lvm) ID lvmht27 bpcat gender age BMI 1 1 31.281 1 1 17.63 21.45 2 2 36.780 1 2 16.11 19.78 3 6 20.660 1 2 17.03 20.58 4 10 44.222 1 2 11.50 25.34 5 16 23.302 1 1 11.90 17.30 6 20 27.735 1 2 10.47 19.16 We use the R function tapply to calculate the mean of LVMI by blood pressure group > tapply(lvm$lvmht27, lvm$bpcat, mean) 1 2 3 29.34266 33.79100 34.11569 2.48 We use also the R function tapply to calculate the geometric mean of LVMI by blood pressure group > exp(tapply(log(lvm$lvmht27), lvm$bpcat, mean)) 1 2 3 28.60586 33.34814 32.88941 CHAPTER 2/DESCRIPTIVE STATISTICS 20 2.49 > boxplot(lvm$lvmht27~lvm$bpcat, pressure group”) 2.50 Since the box plots by blood pressure group are skewed, the geometric mean provides a more appropriate measure of location for this type of data. main=”Box plot of LVMI by 15 20 25 30 35 40 45 50 Box plot of LVMI by blood pressure group 1 2 3 blood PROBABILITY 3.1 A1 ๏ƒˆ A2 means that at least one parent has influenza. 3.2 A1 ๏ƒ‡ A2 means that both parents have influenza. 3.3 No. Both children can have influenza. 3.4 A3 ๏ƒˆ B means that at least one child has influenza, because if A3 occurs, then B must occur. Therefore, A3 ๏ƒˆ B ๏€ฝ B . 3.5 A3 ๏ƒ‡ B means that the first child has influenza. Therefore, A3 ๏ƒ‡ B ๏€ฝ A3 . 3.6 C ๏€ฝ A1 ๏ƒˆ A2 3.7 D ๏€ฝ B๏ƒˆC 3.8 A1 means that the mother does not have influenza. 3.9 A2 means that the father does not have influenza. 3.10 C ๏€ฝ A1 ๏ƒ‡ A2 3.11 D ๏€ฝ B๏ƒ‡C Therefore, the events are not independent. 3.12 3.13 3.14 21 22 CHAPTER 3/PROBABILITY 3.15 3.16 3.17 Let A ๏€ฝ {77-year-old man is affected}, B ๏€ฝ {76-year-old woman is affected}, C ๏€ฝ {82-year-old woman is affected}. It follows that Pr ๏€จ A ๏ƒ‡ B ๏ƒ‡ C ๏€ฉ ๏€ฝ.049๏‚ด.023๏‚ด.078 ๏€ฝ 8.8 ๏‚ด 10 ๏€ญ5 We need to compute Pr ๏€จ B ๏ƒˆ C ๏€ฉ . From the addition law, Pr ๏€จ B ๏ƒˆ C ๏€ฉ ๏€ฝ Pr ๏€จ B๏€ฉ ๏€ซ Pr ๏€จ C ๏€ฉ ๏€ญ Pr ๏€จ B ๏ƒ‡ C ๏€ฉ ๏€ฝ.023๏€ซ.078 ๏€ญ ๏€จ.023๏‚ด.078๏€ฉ ๏€ฝ.099 3.18 We wish to compute Pr ๏€จ A ๏ƒˆ B ๏ƒˆ C ๏€ฉ . We have 3.19 We wish to compute ๏€จ ๏€ฉ ๏€จ . Hence ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ Pr E ๏€ฝ 0.049 1๏€ญ 0.023 ๏‚ด 1๏€ญ 0.078 ๏€ซ 1๏€ญ 0.049 ๏‚ด 0.023๏‚ด 1๏€ญ 0.078 ๏€ซ 1๏€ญ 0.049 ๏‚ด 1๏€ญ 0.023 ๏‚ด 0.078 ๏€ฉ ๏€ฝ 0.0441๏€ซ 0.0202 ๏€ซ 0.0725 ๏€ฝ 0.1368 .0202 ๏€ซ.0725 .0926 ๏€ฝ ๏€ฝ.677 .1368 .1368 3.20 We have Pr(affected individual is a woman) ๏€ฝ 3.21 We have Pr(both affected individuals are women) ๏€ฝ ๏€จ1๏€ญ.049 ๏€ฉ๏€จ.023๏€ฉ๏€จ.078๏€ฉ .049๏€จ.023๏€ฉ๏€จ1๏€ญ.078๏€ฉ ๏€ซ.049๏€จ1๏€ญ.023๏€ฉ๏€จ.078๏€ฉ ๏€ซ ๏€จ1๏€ญ.049 ๏€ฉ๏€จ.023๏€ฉ๏€จ.078 ๏€ฉ .00171 .00171 ๏€ฝ ๏€ฝ ๏€ฝ.263 .00104 ๏€ซ.00373๏€ซ.00171 .00648 3.22 3.23 We have Pr(both ๏€ผ 80 years old) ๏€ฝ .049๏€จ.023๏€ฉ๏€จ1๏€ญ.078๏€ฉ .00104 ๏€ฝ ๏€ฝ.160 .00648 .00648 .0015 ๏€ฝ.065. It is higher than the value in Table 3.5 (.049), .023 indicating that these are dependent events. Pr(man affected ๏ผ woman affected) ๏€ฝ .0015 ๏€ฝ.031. This value is also higher than the unconditional .049 probability in Table 3.5 (.023). If there is some common environmental factor that is associated with Alzheimerโ€™s disease, then it would make sense that the conditional probability is higher than the unconditional probability. 3.24 Pr(woman affected ๏ผ man affected) ๏€ฝ 3.25 Let A ๏€ฝ {man affected }, B ๏€ฝ {woman affected }. We have Pr ๏€จ A ๏ƒˆ B๏€ฉ ๏€ฝ Pr ๏€จ A ๏€ฉ ๏€ซ Pr ๏€จ B๏€ฉ ๏€ญ Pr ๏€จ A ๏ƒ‡ B๏€ฉ ๏€ฝ.049๏€ซ.023๏€ญ.0015 ๏€ฝ.0705 3.26 Let Pr(A) denote the overall probability of Alzheimerโ€™s disease. We have that Pr ๏€จ A๏€ฉ ๏€ฝ Pr ๏€จ A 65 ๏€ญ 69 M ๏€ฉ ๏‚ด Pr ๏€จ65 ๏€ญ 69 M ๏€ฉ ๏€ซ ๏‹ ๏€ซ Pr ๏€จ A 85 ๏€ซ F ๏€ฉ ๏‚ด Pr ๏€จ85 ๏€ซ F ๏€ฉ ๏€ฝ.05๏‚ด.016๏€ซ.10๏‚ด.000 ๏€ซ ๏‹ ๏€ซ.06๏‚ด.279 ๏€ฝ.061 Therefore, the expected overall prevalence in the community is 6.1%. 3.27 The expected number of cases with Alzheimerโ€™s disease ๏€ฝ 1000๏‚ด.061 ๏€ฝ 61. CHAPTER 3/PROBABILITY 3.28 23 Let A, B, and C represent influenza status for the 3, 5, and 7 year-old, where A=1 if influenza, A=0 otherwise, and B and C are defined similarly. We wish to compute Pr(A ๏ƒˆ B ๏ƒˆC) . However, Pr(A ๏ƒˆ B ๏ƒˆC) ๏€ฝ 1๏€ญ ๏›1๏€ญ Pr(A)๏ ๏‚ด ๏›1๏€ญ Pr(B)๏ ๏‚ด ๏›1๏€ญ Pr(C)๏ ๏€ฝ 1๏€ญ (1๏€ญ 0.0378)(1๏€ญ 0.0170)2 ๏€ฝ 1๏€ญ 0.9622(0.9830)2 ๏€ฝ 1๏€ญ 09298 ๏€ฝ 0.070 Thus, there is a 7% probability that at least one of the three children gets influenza. 3.29 We use the total probability rule. Let D = 3-4 year-old get influenza. We have: Pr(D) ๏€ฝ 0.0378(0.80) ๏€ซ 0.0569(0.20)(2) ๏€ฝ 0.0302 ๏€ซ 0.0228 ๏€ฝ 0.053 Thus, 5.3% of 3-4 year-olds get influenza. 3.30 Let E = 5-8 year-old get influenza. We have: Pr(E) ๏€ฝ 0.0170(0.70) ๏€ซ 0.0515(0.30)(2) ๏€ฝ 0.0119 ๏€ซ 0.0309 ๏€ฝ 0.043 Thus, 4.3% of 5-8 year-olds get influenza. 3.31 We use Bayesโ€™ Theorem. Let V = child is vaccinated, and I = child gets influenza. We wish to compute Pr(V | I ). We have: From table Pr(I | V )Pr(V ) Pr(V | I ) ๏€ฝ Pr(I | V )Pr(V ) ๏€ซ Pr(I | V )Pr(V ) ๏€จ ๏€ฉ From Table 3.7, and the conditions of the problem, for a 5-8 year-old Pr I |V ๏€ฝ 0.0170 and ๏€จ ๏€ฉ Pr I |V ๏€ฝ 0.0515๏‚ด 2 ๏€ฝ 0.1030 . Also Pr(V ) ๏€ฝ 0.70 and Pr(V ) ๏€ฝ 0.30 . Thus, 0.0170(0.70) Pr(V | I ) ๏€ฝ 0.0170(0.70) ๏€ซ 0.1030(0.30) 0.0119 0.0119 ๏€ฝ ๏€ฝ ๏€ฝ 0.278 0.0119 ๏€ซ 0.0309 0.0428 Thus, there is only a 28% probability that this child was vaccinated. 1 1 1 ๏‚ด ๏€ฝ 2 2 4 3.32 The probability that both siblings are affected is 3.33 The probability that exactly one sibling is affected is 2 ๏‚ด 3.34 The probability that neither sibling will be affected is 1 1 1 ๏‚ด ๏€ฝ 2 2 2 1 1 1 ๏‚ด ๏€ฝ 2 2 4 24 CHAPTER 3/PROBABILITY 3.35 The probability that the younger child is affected should not be influenced by whether or not the older child is affected. Thus, the probability of the younger child being affected remains at 12 . 3.36 The events A, B are independent because whether or not a child is affected does not influence the outcome for other children in the family. 3.37 3.38 ๏ƒฆ 1 ๏ƒถ2 1 The probability that both siblings are affected ๏€ฝ ๏ƒง ๏ƒท ๏€ฝ ๏ƒจ 4 ๏ƒธ 16 ๏ƒฆ1๏ƒถ 3 3 The probability that exactly one sibling is affected ๏€ฝ 2 ๏‚ด ๏ƒง ๏ƒท ๏‚ด ๏€ฝ ๏ƒจ4๏ƒธ 4 8 3.39 ๏ƒฆ 3 ๏ƒถ2 9 The probability that neither sibling is affected ๏€ฝ ๏ƒง ๏ƒท ๏€ฝ ๏ƒจ 4 ๏ƒธ 16 3.40 The probability that both siblings are affected ๏€ฝ 0, because the female sibling cannot get the disease. 3.41 The probability that exactly one sibling is affected ๏€ฝ 12 , since only the male sibling can be affected. 3.42 The probability that neither is affected ๏€ฝ 3.43 ๏ƒฆ 1 ๏ƒถ2 1 Pr( both affected ) ๏€ฝ ๏ƒง ๏ƒท ๏€ฝ ๏ƒจ2๏ƒธ 4 3.44 3.45 3.46 1 1 ๏‚ด1 ๏€ฝ 2 2 ๏ƒฆ1๏ƒถ ๏ƒฆ1๏ƒถ 1 Pr(exactly one affected ) ๏€ฝ 2 ๏‚ด ๏ƒง ๏ƒท ๏‚ด ๏ƒง ๏ƒท ๏€ฝ ๏ƒจ2๏ƒธ ๏ƒจ2๏ƒธ 2 ๏ƒฆ1๏ƒถ ๏ƒฆ1๏ƒถ 1 Pr(neither affected ) ๏€ฝ ๏ƒง ๏ƒท ๏‚ด ๏ƒง ๏ƒท ๏€ฝ ๏ƒจ2๏ƒธ ๏ƒจ2๏ƒธ 4 Bayesโ€™ theorem is used here. Dominant is denoted by DOM, autosomal recessive by AR, and sex-linked by SL. Let A be the event that two male siblings are affected. The posterior probability is given by Pr(DOM ๏ผ A) ๏€ฝ Pr ๏€จ A DOM ๏€ฉ ๏‚ด Pr ๏€จ DOM ๏€ฉ Pr ๏€จ A DOM ๏€ฉ Pr ๏€จ DOM ๏€ฉ ๏€ซ Pr ๏€จ A AR ๏€ฉ Pr ๏€จ AR ๏€ฉ ๏€ซ Pr ๏€จ A SL ๏€ฉ Pr ๏€จSL ๏€ฉ We also know that Pr ๏€จ DOM ๏€ฉ ๏€ฝ Pr ๏€จ AR ๏€ฉ ๏€ฝ Pr ๏€จSL ๏€ฉ ๏€ฝ Pr ๏€จ DOM A๏€ฉ ๏€ฝ Pr ๏€จ A DOM ๏€ฉ Pr ๏€จ A DOM ๏€ฉ ๏€ซ Pr ๏€จ A AR ๏€ฉ ๏€ซ Pr ๏€จ A SL ๏€ฉ 1 from the conditions stated in the problem. Thus, 3 Finally, we know from Problems 3.31, 3.36, and 3.42 that Pr ๏€จ A DOM ๏€ฉ ๏€ฝ 1 4 Pr ๏€จ A AR ๏€ฉ ๏€ฝ 1 16 1 1 4 4 Thus, Pr ๏€จ DOM A๏€ฉ ๏€ฝ ๏€ฝ 4 ๏€ฝ 9 1 1 1 9 ๏€ซ ๏€ซ 4 16 4 16 Similarly, Pr ๏€จ A SL ๏€ฉ ๏€ฝ 1 4 CHAPTER 3/PROBABILITY 25 1 Pr ๏€จ A AR ๏€ฉ 1 16 Pr ๏€จ AR A๏€ฉ ๏€ฝ ๏€ฝ ๏€ฝ 9 Pr ๏€จ A DOM ๏€ฉ ๏€ซ Pr ๏€จ A AR ๏€ฉ ๏€ซ Pr ๏€จ A SL ๏€ฉ 9 16 1 Pr ๏€จ A SL ๏€ฉ 4 Pr ๏€จSL A๏€ฉ ๏€ฝ ๏€ฝ 4 ๏€ฝ 9 ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ Pr A DOM ๏€ซ Pr A AR ๏€ซ Pr A SL 9 16 Thus, the dominant and sex-linked modes of inheritance are the most likely, with the autosomal recessive mode being less likely. 3.47 Let B ๏€ฝ {exactly one of two male siblings is affected}. From Problems 3.32, 3.37, and 3.43, Pr ๏€จ B DOM ๏€ฉ ๏€ฝ 3.48 1 2 Pr ๏€จ B AR ๏€ฉ ๏€ฝ 3 8 Pr ๏€จ B SL ๏€ฉ ๏€ฝ 1 2 Thus, from Bayesโ€™ theorem, the posterior probabilities are given by 1 1 Pr ๏€จ B DOM ๏€ฉ 4 2 Pr ๏€จ DOM B๏€ฉ ๏€ฝ ๏€ฝ ๏€ฝ 2 ๏€ฝ Pr ๏€จ B DOM ๏€ฉ ๏€ซ Pr ๏€จ B AR ๏€ฉ ๏€ซ Pr ๏€จ B SL ๏€ฉ 1 ๏€ซ 3 ๏€ซ 1 11 11 2 8 2 8 3 Pr ๏€จ B AR ๏€ฉ 3 Pr ๏€จ AR B๏€ฉ ๏€ฝ ๏€ฝ 8 ๏€ฝ Pr ๏€จ B DOM ๏€ฉ ๏€ซ Pr ๏€จ B AR ๏€ฉ ๏€ซ Pr ๏€จ B SL ๏€ฉ 11 11 8 1 Pr ๏€จ B SL ๏€ฉ 4 Pr ๏€จSL B๏€ฉ ๏€ฝ ๏€ฝ 2 ๏€ฝ 11 Pr ๏€จ B DOM ๏€ฉ ๏€ซ Pr ๏€จ B AR ๏€ฉ ๏€ซ Pr ๏€จ B SL ๏€ฉ 11 8 Here the three genetic types are about equally likely. Let C ๏€ฝ {both one male and one female sibling are affected}. The sex of the siblings is only relevant for sex-linked disease. Thus, from Problems 3.31, 3.36, and 3.39, Pr ๏€จC DOM ๏€ฉ ๏€ฝ 1 4 Pr ๏€จC AR ๏€ฉ ๏€ฝ 1 16 Pr ๏€จ C SL ๏€ฉ ๏€ฝ 0 Thus, 1 1 4 Pr ๏€จC DOM ๏€ฉ 4 Pr ๏€จ DOM C ๏€ฉ ๏€ฝ ๏€ฝ ๏€ฝ 4 ๏€ฝ 1 1 5 5 Pr ๏€จC DOM ๏€ฉ ๏€ซ Pr ๏€จC AR ๏€ฉ ๏€ซ Pr ๏€จC SL ๏€ฉ ๏€ซ 4 16 16 1 Pr ๏€จC AR ๏€ฉ 1 Pr ๏€จ AR C ๏€ฉ ๏€ฝ ๏€ฝ 16 ๏€ฝ 5 Pr ๏€จC DOM ๏€ฉ ๏€ซ Pr ๏€จC AR ๏€ฉ ๏€ซ Pr ๏€จC SL ๏€ฉ 5 16 Pr ๏€จSL C ๏€ฉ ๏€ฝ 0 3.49 Let D ๏€ฝ {male sibling affected, female sibling not affected}. Pr ๏€จ D DOM ๏€ฉ ๏€ฝ Pr ๏€จ D AR ๏€ฉ ๏€ฝ 1 3 3 ๏‚ด ๏€ฝ 4 4 16 Pr ๏€จ D SL ๏€ฉ ๏€ฝ 1 1 ๏‚ด1 ๏€ฝ 2 2 1 1 1 ๏‚ด ๏€ฝ 2 2 4 26 CHAPTER 3/PROBABILITY Notice that the event D is not the same as the event that exactly one sibling is affected, since we are specifying which of the two siblings is affected. We have 1 1 4 Pr ๏€จ D DOM ๏€ฉ 4 Pr ๏€จ DOM D๏€ฉ ๏€ฝ ๏€ฝ ๏€ฝ 4 ๏€ฝ 1 3 1 15 ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ 15 Pr D DOM ๏€ซ Pr D AR ๏€ซ Pr D SL ๏€ซ ๏€ซ 4 16 2 16 3 3 Pr ๏€จ D AR ๏€ฉ 1 16 16 Pr ๏€จ AR D๏€ฉ ๏€ฝ ๏€ฝ ๏€ฝ Pr ๏€จ D DOM ๏€ฉ ๏€ซ Pr ๏€จ D AR ๏€ฉ ๏€ซ Pr ๏€จ D SL ๏€ฉ 1 ๏€ซ 3 ๏€ซ 1 15 5 4 16 2 16 1 8 Pr ๏€จ D SL ๏€ฉ Pr ๏€จSL D๏€ฉ ๏€ฝ ๏€ฝ 2 ๏€ฝ 15 15 Pr ๏€จ D DOM ๏€ฉ ๏€ซ Pr ๏€จ D AR ๏€ฉ ๏€ซ Pr ๏€จ D SL ๏€ฉ 16 Thus, in this situation the sex-linked mode of inheritance is the most likely. 3.50 3.51 3.52 3.53 Pr (mother current smoker ๏ƒ‡ father current smoker) ๏€ฝ Pr (mother current smoker) ๏‚ด Pr (father current smoker) ๏€ฝ0 .4 ๏‚ด 0.5 ๏€ฝ 0.20 CHAPTER 3/PROBABILITY 27 3.54 Pr (father current smoker ๏ผ mother not current smoker ) ๏€ฝ Pr ( father current smoker) ๏€ฝ 0.5 This is a conditional probability compared with the joint probability in Problem 3.53. 3.55 Pr (father current smoker ๏ƒ‡ mother not current smoker) ๏€ฝ Pr (father current smoker) ๏‚ด Pr (mother not current smoker ๏ผ father current smoker) ๏€ฝ 0.5 ๏‚ด ( 1๏€ญ 0.6 ) ๏€ฝ 0.20 3.56 The smoking habits of the parents are not independent random variables because Pr (mother current smoker ๏ผ father current smoker) ๏€ฝ 0.6 ๏‚น Pr (mother current smoker ๏ผ father not current smoker) ๏€ฝ 0.2 3.57 Let A ๏€ฝ {child has asthma}, M ๏€ฝ {mother current smoker}, M = {mother not current smoker}, F ๏€ฝ {father current smoker} , F ๏€ฝ {father not current smoker}. We want Pr ๏€จ A ๏€ฉ. We have that ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€ซPr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ซ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ Pr A ๏€ฝ Pr A | M ๏ƒ‡ F ๏‚ด Pr M ๏ƒ‡ F ๏€ซ Pr A | M ๏ƒ‡ F ๏‚ด Pr M ๏ƒ‡ F We are given that ๏€จ ๏€ฉ ๏€จ ๏€ฉ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏€ฝ 0.05, Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏€ฝ 0.04 Pr A | M ๏ƒ‡ F ๏€ฝ 0.15, Pr A | M ๏ƒ‡ F ๏€ฝ 0.13 Also, ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ฝ Pr ๏€จ F ๏€ฉ ๏‚ด Pr ๏€จ M | F ๏€ฉ ๏€ฝ 0.5๏‚ด 0.2 ๏€ฝ 0.10 Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ฝ Pr ๏€จ F ๏€ฉ ๏‚ด Pr ๏€จ M | F ๏€ฉ ๏€ฝ 0.5๏‚ด 0.4 ๏€ฝ 0.20 Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ฝ Pr ๏€จ F ๏€ฉ ๏‚ด Pr ๏€จ M | F ๏€ฉ ๏€ฝ 0.5๏‚ด 0.8 ๏€ฝ 0.40 Pr M ๏ƒ‡ F ๏€ฝ Pr F ๏€ฝ Pr M | F ๏€ฝ 0.5๏‚ด 0.6 ๏€ฝ 0.30 Therefore, Pr ๏€จ A๏€ฉ ๏€ฝ.15๏‚ด.30 ๏€ซ.13๏‚ด.10 ๏€ซ.05๏‚ด.20 ๏€ซ.04 ๏‚ด.40 ๏€ฝ.084 3.58 We want to compute Pr ๏€จ F A๏€ฉ. We have from the definition of conditional probability that Pr( F A) ๏€ฝ Pr ๏€จ F ๏ƒ‡ A ๏€ฉ Pr ๏€จ F ๏ƒ‡ A ๏€ฉ ๏€ฝ Pr ๏€จ A๏€ฉ .084 Furthermore, ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€ฝ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ซ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ Pr F ๏ƒ‡ A ๏€ฝ Pr M ๏ƒ‡ F ๏ƒ‡ A ๏€ซ Pr M ๏ƒ‡ F ๏ƒ‡ A Referring to problem 3.61, we note that Pr ๏€จ F ๏ƒ‡ A๏€ฉ ๏€ฝ.15๏‚ด.30 ๏€ซ.05๏‚ด.20 ๏€ฝ.055 Thus, Pr ๏€จ F A๏€ฉ ๏€ฝ .055 ๏€ฝ.655 .084 28 3.59 CHAPTER 3/PROBABILITY We want to compute Pr ๏€จ M A ๏€ฉ. We have that Pr ๏€จ M A๏€ฉ ๏€ฝ ๏€จ ๏€ฉ Pr ๏€จ M ๏ƒ‡ A๏€ฉ where Pr ๏€จ A๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€ฝ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ซ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ Pr M ๏ƒ‡ A ๏€ฝ Pr M ๏ƒ‡ F ๏ƒ‡ A ๏€ซ Pr M ๏ƒ‡ F ๏ƒ‡ A ๏€ฝ 0.15๏‚ด 0.30 ๏€ซ 0.13๏‚ด 0.10 ๏€ฝ 0.058 Thus, Pr ๏€จ M A๏€ฉ ๏€ฝ 3.60 .058 ๏€ฝ.690 .084 ๏€จ ๏€ฉ We want to compute Pr F | A . We have that ๏€จ ๏€ฉ Pr F | A ๏€ฝ ๏€จ Pr F ๏ƒ‡ A ๏€จ ๏€ฉ ๏€ฉ Pr A where ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€ฝ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ซ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ฝ ๏€จ1๏€ญ 0.15๏€ฉ ๏‚ด 0.30 ๏€ซ ๏€จ1๏€ญ 0.05๏€ฉ ๏‚ด 0.20 ๏€ฝ 0.445 Pr F ๏ƒ‡ A ๏€ฝ Pr M ๏ƒ‡ F ๏ƒ‡ A ๏€ซ Pr M ๏ƒ‡ F ๏ƒ‡ A ๏€จ ๏€ฉ Thus, Pr F | A ๏€ฝ 3.61 ๏€จ Pr F ๏ƒ‡ A ๏€จ ๏€ฉ Pr A ๏€ฉ ๏€ฝ 0.455 ๏€ฝ 0.486 0.916 ๏€จ ๏€ฉ ๏€จ ๏€ฉ We want to compute Pr M | A . We have Pr M | A ๏€ฝ ๏€จ Pr M ๏ƒ‡ A ๏€จ ๏€ฉ ๏€ฉ Pr A where ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€จ ๏€ฉ ๏€ฝ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ซ Pr ๏€จ A | M ๏ƒ‡ F ๏€ฉ ๏‚ด Pr ๏€จ M ๏ƒ‡ F ๏€ฉ ๏€ฝ ๏€จ1๏€ญ 0.15๏€ฉ ๏‚ด 0.30 ๏€ซ ๏€จ1๏€ญ 0.13๏€ฉ ๏‚ด 0.10 ๏€ฝ 0.342 Pr M ๏ƒ‡ A ๏€ฝ Pr M ๏ƒ‡ F ๏ƒ‡ A ๏€ซ Pr M ๏ƒ‡ F ๏ƒ‡ A ๏€จ ๏€ฉ Thus, Pr M | A ๏€ฝ 3.62 ๏€จ Pr M ๏ƒ‡ A ๏€จ ๏€ฉ Pr A ๏€ฉ ๏€ฝ 0.342 ๏€ฝ 0.373 0.916 ๏€จ ๏€ฉ We found in problem 3.58 that Pr ๏€จ F A ๏€ฉ ๏€ฝ.655 and in problem 3.60 that Pr F | A ๏€ฝ 486. Since, ๏€จ ๏€ฉ ๏€จ ๏€ฉ Pr F | A ๏‚น Pr F | A , the fatherโ€™s smoking status and the childโ€™s asthma status are not independent. 3.63 ๏€จ ๏€ฉ We found in problem 3.59 that Pr ๏€จ M A ๏€ฉ ๏€ฝ.690 and in problem 3.61 that Pr M | A ๏€ฝ 0.373. Since ๏€จ ๏€ฉ ๏€จ ๏€ฉ Pr M | A ๏€ฝ 0.690 ๏‚น Pr M | A ๏€ฝ 0.373, the motherโ€™s smoking status and the childโ€™s asthma status are not independent.

Document Preview (30 of 455 Pages)

User generated content is uploaded by users for the purposes of learning and should be used following SchloarOn's honor code & terms of service.
You are viewing preview pages of the document. Purchase to get full access instantly.

Shop by Category See All


Shopping Cart (0)

Your bag is empty

Don't miss out on great deals! Start shopping or Sign in to view products added.

Shop What's New Sign in