Solution Manual for A Second Course in Statistics: Regression Analysis, 8th Edition

Chapter 3 3-1 Chapter Simple Linear Regression 3.1 a. b. c. d. 3.2 Since the line passes through the point (0, 1), 1   0  1  0    0  1. Also, since it also passes through the point (2, 3), 3   0  1  2   3  1  2 1  1  1  y  1  x 3.3 a. Using the technique explained in Exercise 3.2: ü 2 =  0 + 1 (0)ï ï ý 6 =  0 + 1 (2)ï ï þ   0 = 2ïüï ý  y = 2 + 2x 1 = 2ïïþ b. ü 4 =  0 + 1 (0)ï ï ý 6 =  0 + 1 (2)ï ï þ   0 = 4ïüï ý  y = 4+ x 1 = 1 ïïþ c. ü -2 =  0 + 1 (0) ï ï ý -6 =  0 + 1 (-1)ï ï þ  d. ü -4 =  0 + 1 (0)ï ï ý -7 =  0 + 1 (3)ï ï þ   0 = -2ïüï ý  y = -2 + 4x 1 = 4 ïïþ  0 = -4ïüï ý 1 = -1 ïïþ  y = -4 – x Copyright © 2020 Pearson Education, Inc. 3 3-2 Simple Linear Regression 3.4 a. b. c. d. e. 3.5 Slope  1  y-intercept  0  a. 2 3 b. 1 1 c. 3 2 d. 5 0 e. 2 4 Copyright © 2020 Pearson Education, Inc. Chapter 3 3.6 3-3 Some preliminary calculations are:  x  21 2  x  91  y  18 2  y  68 a. 21  3.5 6 18 y  3 6 x  xy  78 SS xx   x 2  nx 2  91  6  3.5   17.5 2 SS xy   xy  nxy  78  6  3.5  3  15 SS yy   y 2  ny 2  68  6  3  14 2 ˆ1  SS xy SS xx  15  0.8571 17.5 ˆ0  y  ˆ1x  3   0.8571 3.5   0 b. 3.7 a. To compute ˆ0 and ˆ1 , we first construct the following table: x y x2 xy y2 2 4 8 4 16 1 0 1 2 3 1 9 3 1 3 0 1 1 2 0 1 4 9 1 1 åx=0 å y = 10 å xy = -12 2 å x = 10 2 å y = 36 Then, 2 (å x ) (0)2 = 10 5 n (å x)(å y ) 0(10) SS xy = å xy = -12 = -12 5 n 2 SS xx = å x – = 10 – Copyright © 2020 Pearson Education, Inc. 3-4 Simple Linear Regression SS yy = å y 2 y= 2 (å y ) n 2 = 36 – å y 10 = =2 n 5 (10) = 16 5 åx 0 x= = =0 n 5 Thus, the least squares estimates of  0 and 1 are: ˆ1 = SS xy SS xx = -12 = -1.2 10 ˆ0 = y – ˆ1x = 2 – (-1.2)(0) = 2 and the equation of the least squares prediction line is yˆ = 2 -1.2 x. b. 3.8 3.9 a. y =  0 + 1x +  b. Yes, since the data appears to demonstrate a straight-line relationship. c. Sales_Price  1.4  1.41 Market_Val d. ˆ0  1.4, when x  0 (no market value), then the sales price has no practical meaning. e. Various answers possible. A possible answer for the range on which the slope is $100,000 < x < $1,000 ,000. f. “mean sale price” = 1.4 + 1.41 $300,000   $423,000 a. Yes, there appears to be a positive linear trend. As the height above the horizon increases, the angular size tends to increase. b & c. A sketch (answers can vary) of the line with lines drawn to the sketch line is: Copyright © 2020 Pearson Education, Inc. Chapter 3 3-5 Scatterplot of ANGLE vs HEIGHT 327 326 ANGLE 325 324 323 322 10 20 30 40 50 60 70 80 HEIGHT The estimated deviations and squared deviations are: ANGLE HEIGHT 321.9 17 322.3 18 322.4 26 323.2 32 323.4 38 324.4 42 325.0 49 325.7 52 325.8 57 325.0 60 326.9 63 326.0 67 325.8 73 Est Fit 322.2 322.3 323.0 323.4 323.9 324.2 324.8 325.0 325.4 325.7 325.9 326.2 326.7 Dev -0.3 0.0 -0.6 -0.2 -0.5 0.2 0.2 0.7 0.4 -0.7 1.0 -0.2 -0.9 Sq Dev 0.09 0.00 0.36 0.04 0.25 0.04 0.04 0.49 0.16 0.49 1.00 0.04 0.81 3.81 The sum of the squared deviations is 3.81. d. From the sketched line, the y-intercept is about 321 and the slope is about 0.1. These are close to the y-intercept, 320.636, and slope, 0.083, of the regression line. e. From the printout, the SSE is 3.56465. The sum of squares from the estimated line is 3.81. The SSE from the regression line is smaller. Copyright © 2020 Pearson Education, Inc. 3-6 Simple Linear Regression 3.10 a. Using MINITAB, the results are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 677.45 677.45 24.41 0.003 VO2Max 1 677.45 677.45 24.41 0.003 Error 6 166.55 27.76 Lack-of-Fit 5 142.05 28.41 1.16 0.604 Pure Error 1 24.50 24.50 Total 7 844.00 VIF Coefficients Term Coef SE Coef T-Value P-Value Constant -27.2 19.8 -1.38 0.217 VO2Max 0.558 0.113 4.94 0.003 1.00 Regression Equation HR% = -27.2 + 0.558 VO2Max The least squares line is yˆ  27.2  0.558 x. b. Since 0 is not in the range of observed values of VO2Max, the y-intercept does not have a practical interpretation. c. ˆ1  0.558 For each unit increase in the value of VO2Max, the mean HR% is estimated to increase by 0.558. 3.11 3.12 a. No, there does not appear to any trend for cooperation use versus the average payoff. b. No, there does not appear to any trend for defective use versus the average payoff. c. Yes, there appears to be somewhat of a linear relationship for average payoff and punishment use. d. Negative relationship; the more punishment use, the average payoff decreases. e. Yes, winners tend to punish less than non-winners. a. Using MINITAB, some calculations are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 6083.84 6083.84 26.35 0.000 Year 1 6083.84 6083.84 26.35 0.000 Error 10 2309.07 230.91 Lack-of-Fit 9 2301.07 255.67 31.96 0.136 Pure Error 1 8.00 8.00 Total 11 8392.92 Copyright © 2020 Pearson Education, Inc. Chapter 3 3-7 Model Summary S R-sq R-sq(adj) R-sq(pred) 15.1956 72.49% 69.74% 54.61% Coefficients Term Coef SE Coef T-Value P-Value Constant -3675 724 -5.08 0.000 Year 1.870 0.364 5.13 0.000 VIF 1.00 Regression Equation Cost = -3675 + 1.870 Year The least squares line is yˆ  3675  1.870 x. b. Since 0 is not in the range of observed values of Year, the y-intercept does not have a practical interpretation. c. ˆ1  1.87 For each unit increase in cost, the mean cost is estimated to increase by 1.87 million dollars. 3.13 a. Some preliminary calculations are: n = 24 å x = 6167 å y = 135.8 2 å x = 1,641,115 2 å y = 769.72 SS xy = å xy – 2 SS xy SS xx = n = 34,765 – 2 SS xx = å x – ˆ1 = (å x)(å y ) (å x ) n å xy = 34,765 (6167)(135.8) 24 = -129.94167 2 = 164,115 – (6167) 24 = 56, 452.958 -129.94167 = -0.002301769 @ -0.0023 56, 452.958 ˆ0 = y – ˆ1 x = æ 6167 ö÷ 135.8 – (-0.002301769)çç = 6.249792065 @ 6.251 çè 24 ø÷÷ 24 The least squares line is yˆ = 6.25 – 0.0023x. b. ˆ 0 = 6.25 Since x = 0 is not in the observed range, ˆ0 has no interpretation other than being the y-intercept. ˆ1 = -0.0023. For each additional increase of 1 part per million of pectin, the mean sweetness index is estimated to decrease by 0.0023. Copyright © 2020 Pearson Education, Inc. 3-8 3.14 Simple Linear Regression c. yˆ = 6.25 – 0.0023(300) = 5.56. a. Using MINITAB, some preliminary results are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 9.08 9.080 0.18 0.676 CDIFF 1 9.08 9.080 0.18 0.676 Error 22 1116.78 50.763 Lack-of-Fit 21 1026.06 48.860 0.54 0.813 Pure Error 1 90.72 90.720 Total 23 1125.86 VIF Coefficients Term Coef SE Coef T-Value P-Value Constant 49.57 1.56 31.76 0.000 CDIFF 0.0275 0.0650 0.42 0.676 1.00 Regression Equation VSHARE = 49.57 + 0.0275 CDIFF The least squares line is yˆ  49.57  0.0275 x. b. Using MINITAB, the scatterplot is: Fitted Line Plot VSHARE = 49.57 + 0.02748 CDIFF 65 S R-Sq R-Sq(adj) 7.12481 0.8% 0.0% 60 VSHARE 55 50 45 40 35 -75 -50 -25 0 25 50 CDIFF There does not appear to be much of a linear relationship between Democratic vote share and charisma difference. There might be a slight positive linear trend. c. ˆ1  0.0275 For each unit increase in charisma difference, the mean Democratic vote share is estimated to increase by 0.0275 points. 3.15 Some preliminary calculations are: y= å x 103.07 = = 0.71576 n 144 x= å y 792 = = 5.5 n 144 Copyright © 2020 Pearson Education, Inc. Chapter 3 SS xy = å xy SS xx = å x – ˆ1 = SS xy SS xx 792(103.07) å xå y = 586.86 = 19.975 n 144 2 2 = 3-9 (å x ) = 5,112 – n 7922 = 756 144 19.975 = 0.026421957 756 ˆo = y – ˆ1 x = æ 792 ö÷ 103.07 – (0.026421957) çç = 0.570443121 çè 144 ÷÷ø 144 The estimated regression line is yˆ = 0.5704 + 0.0264 x. Since x = 0 is nonsensical, no practical interpretation of ˆ = 0.5704. For each one-position increase in order, estimated recall proportion 0 increases by ˆ = 0.0264. 1 The scatterplot in this problem clearly shows a significantly nonlinear trend. Therefore, the linear model is not the best to describe the data in this scatter plot. Scatterplot of Mass vs Time 7 6 5 4 Mass 3.16 3 2 1 0 0 10 20 30 40 50 60 Time Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 89.79 89.7942 122.19 0.000 Time 1 89.79 89.7942 122.19 0.000 Error 21 15.43 0.7349 Total 22 105.23 Coefficients Term Coef SE Coef T-Value P-Value Constant 5.221 0.296 17.64 0.000 -0.1140 0.0103 -11.05 0.000 Time VIF 1.00 Regression Equation Mass = 5.221 – 0.1140 Time Copyright © 2020 Pearson Education, Inc. 3-10 Simple Linear Regression The fitted regression line is yˆ = 5.221 – 0.1140 x. Since the coefficient of time is negative, there is evidence that the mass of the spill tends to decrease as time increases. For each minute increase in time, the mean mass is estimated to diminish by 5.221 pounds. a. Using MINITAB, the scatterplot of the data is: Scatterplot of AACC vs AAFEMA 0.7 0.6 0.5 AACC 3.17 0.4 0.3 0.2 0.1 0.0 0 5 10 15 20 25 30 AAFEMA There does not appear to be any apparent trend in the plot. b. Using MINITAB, the results are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 0.06200 0.06200 2.79 0.102 AAFEMA 1 0.06200 0.06200 2.79 0.102 Error 48 1.06817 0.02225 Lack-of-Fit 36 0.92617 0.02573 2.17 0.075 Pure Error 12 0.14200 0.01183 Total 49 1.13016 VIF Coefficients Term Coef SE Coef T-Value P-Value Constant 0.2489 0.0292 8.52 0.000 AAFEMA 0.00542 0.00324 1.67 0.102 1.00 Regression Equation AACC = 0.2489 + 0.00542 AAFEMA The least squares line is yˆ  0.2489  0.00542 x. The estimated y-intercept is ˆ0  0.2489 and the estimated slope is ˆ1  0.00542. c. ˆ0  0.2489 Since 0 is not in the observed range of the average annual FEMA relief, the yintercept has no practical interpretation. ˆ1  0.00542 For each unit increase in the average annual FEMA relief, the mean average annual number of public corruption convictions is estimated to increase by 0.00542 per 100,000 residents. Copyright © 2020 Pearson Education, Inc. Chapter 3 3.18 3.19 SSE 0.219 = = 0.0313 n-2 9-2 a. s2 = b. s = 0.0313 = 0.1769 a. Using data from Exercise 3.6, SSE  SS  ˆ SS  14  0.875115   1.1435 yy s2  b. 1 xy SSE 1.1435   0.2859 n2 62 s  0.2856  0.5347 Using data from Exercise 3.7, SSE  SS  ˆ SS  16   1.2  12   1.6 yy 3.20 3-11 1 xy s2  SSE 1.6   0.5333 n2 52 s  0.5333  0.7303 a. s2  SSE 1.04   0.04 n  2 28  2 s  0.04  0.2 b. We would expect most of the observed value to fall within 2s or 2  0.2   0.4 units of the least squares line. 3.21 3.22 3.23 a. y =  0 + 1 x +  b. The least squares line is yˆ = 120 + 0.3456 x. c. Assumption 1: The mean of the probability distribution of  is 0. Assumption 2: The variance of the probability distribution of  is constant for all settings of the independent variable x. Assumption 3: The probability distribution of  is normal. Assumption 4: The errors associated with any two different observations are independent. d. s = 635.187 e. yˆ  2s  yˆ  2(635.187)  yˆ  1270.374 a. From Exercise 3.12, s  15.1956. b. We would expect most of the observed values to fall within 2s or 2 15.1956   30.3912 units of the least squares line. a. Using calculations from Exercise 3.13, Copyright © 2020 Pearson Education, Inc. 3-12 Simple Linear Regression SS yy = å y 2 – 2 (å y ) n 2 = 769.72 – (135.8) 24 = 1.318333 SSE  SS yy  ˆ1SS xy  1.318333   0.002301769  129.94167   1.01924 s2  3.24 SSE 1.01924   0.0463 n2 24  2 s  0.0463  0.2152 b. The units of measure for s 2 are square units. It is very difficult to interpret units such as dollars squared, minutes squared, etc. c. We would expect most of the observed values to fall within 2s or 2  0.2152   0.4304 units of the least squares line. a. The estimate of  2 is s 2  b. The estimate of  is s  0.02225  0.1492. c. The estimate of  can be interpreted practically because it is measured in the same units as SSE 1.06817   0.02225. n2 50  2 the data. The units of measure of  2 are square units. d. We would expect most of the observed values to fall within 2s or 2  0.1492   0.2984 units of the least squares line. In this problem, the units of measure is dollars per capita. However, looking at the scatterplot, the data do not fall close to a straight line. The model will not be very accurate in predicting a state’s average annual number of public corruption convictions. 3.25 3.26 a. The least squares line with the steepest slope is with the pair AB Magnitude Alert and AB Magnitude No-Tone. b. The least squares line that produces the largest SSE is with the pair AB Magnitude Alert and AB Magnitude No-Tone. c. The least squares line that produces the smallest estimate of  is with the pair AB Magnitude Sim and AB Magnitude Alert. a. To determine if 1 differs from 0, we test: H 0 : 1 = 0 H a : 1 ¹ 0 Copyright © 2020 Pearson Education, Inc. Chapter 3 The test statistic is t = ˆ1 s SS xx = 3-13 0.8571 = 6.71 0.5345 17.5 The rejection region requires  / 2 = 0.05 / 2 = 0.025 in each tail of the t distribution. From Table 2, Appendix D, with df = n – 2 = 6 – 2 = 4 , t 0.025 = 2.776 . The rejection region is t 2.776. Since the observed value of the test statistic falls in the rejection region (t = 6.71 > 2.776), H0 is rejected. There is sufficient evidence to indicate that x contributes information for the prediction of y using a linear model at  = .05. b. To determine if 1 differs from 0, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t = ˆ1 s SS xx = -1.2 = -5.20 0.7303 10 The rejection region requires  / 2 = 0.05 / 2 = 0.025 in each tail of the t distribution. From Table 2, Appendix D, with df = n – 2 = 5 – 2 = 3 , t 0.025 = 3.182 . The rejection region is t 3.182. Since the observed value of the test statistic falls in the rejection region (t = -5.20 0 From the printout, the test statistic is t  38.132 and the p-value is p  0.000 / 2  0.000. Since the p-value is less than   p  0.000  0.01 , H0 is rejected. There is sufficient evidence to indicate there is a positive linear relationship between appraised property value and sale price at   0.01. b. From the printout, the 95% confidence interval is 1.335,1.482  . We are 95% confident that for each $1000 increase in market value, the mean sale price is estimated to increase by from $1,335 to $1,482. Copyright © 2020 Pearson Education, Inc. 3-14 Simple Linear Regression c. 3.28 In order to obtain a narrower confidence interval, one could lower the confidence level (i.e. to 90%) or increase the sample size. Some preliminary calculations are: SS yy = å y 2 – 2 (å y ) n 2 = 769.72 – (135.8) 24 = 1.3183333 SSE = SS yy – ˆ1SS xy = 1.3183333 – (-0.002301796)(-129.94167) = 1.019237592 s2 = SSE 1.019237592 = = 0.046329 n-2 22 sˆ = 1 s2 0.046329 = = 0.000906 SS xx 56452.958 For confidence level 0.90,  = 0.10 and  / 2 = 0.10 / 2 = 0.05. From Table 2, Appendix D with df = n – 2 = 24 – 2 = 22, t 0.05 = 1.717. The confidence interval is: ˆ1  t 0.05 sˆ  -0.0023  1.717 (0.000906)  (-0.0039, -0.0008) 1 We are 90% confident that the change in the mean sweetness index for each one unit change in the pectin is between 0.0039 and 0.0007. 3.29 a. The equation for the simple linear regression is y =  0 + 1 x +  . b. The value of  0 is probably irrelevant. By definition,  0 is the mean value of entitlement score for those whose helicopter parent score is 0. We would expect 1 to be positive. As the helicopter parent score increases, the entitlement score increases. c. Since the p-value is less than   p  0.002  0.01 , H0 is rejected. There is sufficient evidence to indicate there is a positive linear relationship between entitlement scores and helicopter parent score at   0.01. 3.30 For confidence level 0.95,  = 0.05 and  / 2 = 0.05 / 2 = 0.025. From Table 2, Appendix D with df = n – 2 = 50 – 2 = 48, t 0.025 » 2.021. The confidence interval is: ˆ1  t 0.025 sˆ  0.00542  2.021(0.00324)  (-0.0011,0.0120) 1 We are 95% confident that the increase in the mean state’s average annual number of public corruption convictions is between -0.0011 and 0.0120 for each unit increase in the state’s average annual FEMA relief. Copyright © 2020 Pearson Education, Inc. Chapter 3 3.31 3-15 a. The equation for the simple linear regression is y =  0 + 1 x +  . b. The y-intercept does not have any meaning because 0 cannot be in the range of observed beauty index. c. For each unit increase in the beauty index, the mean relative success is estimated to increase by 22.91 points. d. To determine if the slope of the line is positive, we test: H 0 : 1 = 0 H a : 1 > 0 The test statistic is t = ˆ1 sˆ 1 = 22.91 = 6.14. 3.73 The rejection region requires   0.01 in the upper tail of the t distribution. From Table 2, Appendix D, with df  n  2  641  2  639, t0.01  2.326. The rejection region is t  2.326. Since the observed value of the test statistic falls in the rejection region  t  6.14  2.326  , H0 is rejected. There is sufficient evidence to indicate the slope of the line is positive at   0.01. There is evidence to indicate that as the beauty index increases, the relative success also increases. 3.32 To determine if the simple linear regression model is useful for predicting Democratic vote share, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t = ˆ1 sˆ 1 = 0.0275 = 0.42 and the p-value is p  0.676. (From Exercise 3.14) 0.0650 Since the p-value is not less than   p  0.676  0.10  , H0 is not rejected. There is insufficient evidence to indicate the simple linear regression model is useful for predicting Democratic vote share at   0.10. 3.33 Using the calculations from Exercise 3.15 and these calculations: SS yy = å y 2 – 2 (å y ) n = 83.474 – 103.07 2 = 9.70021597 144 Copyright © 2020 Pearson Education, Inc. 3-16 Simple Linear Regression SSE = SS yy – ˆ1 ( SS xy ) = 9.70021597 – (0.026421957)(19.975) = 9.172437366 s2 = SSE 9.172437366 = = 0.064594629 n-2 144 – 2 s = s 2 = 0.064594629 = 0.254154735 To determine if there is a linear trend between the proportion of names recalled and position, we test: H 0 : 1  0 H a : 1  0 The test statistic is t = ˆ1 – 0 sˆ = 1 ˆ1 s SS xx = 0.02642 – 0 0.25415 756 = 2.86 The rejection region requires  / 2 = 0.01 / 2 = 0.005 in each tail of the t distribution. From Table 2, Appendix D, with df = n – 2 = 144 – 2 = 142, t0.005 » 2.576. The rejection region is t 2.576. . Since the observed test statistic falls in the rejection region (t = 2.86 > 2.576), H0 is rejected. There is sufficient evidence to indicate the proportion of names recalled is linearly related to position at  = .01. 3.34 a. To determine if the spill mass tends to diminish linearly as time increases, we test: H 0 : 1  0 H a : 1  0 Using information from Exercise 3.16, the test statistic is t  11.05 and the p-value is p  0.000 / 2  0.000. Since the p-value is less than   p  0.000  0.05  , H0 is rejected. There is sufficient evidence to indicate the spill mass tends to diminish linearly as time increases at   0.05. b. Using MINTAB, the 95% confidence intervals are: Fits and Diagnostics for All Observations Obs Time Mass Fit SE Fit 95% CI 1 0 6.640 5.221 0.296 (4.605, 5.836) 2 1 6.340 5.107 0.288 (4.508, 5.705) 3 2 6.040 4.993 0.280 (4.411, 5.575) 4 4 5.470 4.765 0.264 (4.215, 5.314) 5 6 4.940 4.537 0.249 (4.018, 5.055) 6 8 4.440 4.309 0.236 (3.819, 4.798) 7 10 3.980 4.080 0.223 (3.617, 4.544) 8 12 3.550 3.852 0.211 (3.414, 4.291) 9 14 3.150 3.624 0.201 (3.207, 4.042) 10 16 2.790 3.396 0.192 (2.996, 3.796) 11 18 2.450 3.168 0.186 (2.782, 3.554) Copyright © 2020 Pearson Education, Inc. Chapter 3 3.36 20 2.140 2.940 0.181 (2.563, 3.317) 13 22 1.860 2.712 0.179 (2.340, 3.084) 14 24 1.600 2.484 0.179 (2.112, 2.857) 15 26 1.370 2.256 0.182 (1.878, 2.634) 16 28 1.170 2.028 0.186 (1.640, 2.416) 17 30 0.980 1.800 0.193 (1.398, 2.202) 18 35 0.600 1.230 0.218 (0.776, 1.684) 19 40 0.340 0.660 0.251 (0.137, 1.182) 20 45 0.170 0.090 0.290 (-0.513, 0.693) 21 50 0.060 -0.480 0.332 (-1.171, 0.210) 22 55 0.020 -1.051 0.377 (-1.834, -0.267) 23 60 0.000 -1.621 0.423 (-2.500, -0.742) a. For each 1% increase in the ln(body mass), the mean ln(eye mass) is estimated to increase by anywhere from 0.25 to 0.30. b. For each 1% increase in the ln(body mass), the mean ln(orbit axis angle) is estimated to decrease by anywhere from 0.14 to 0.50. a. ˆ0 = 0.5151 ˆ1 = 0.000021 b. To determine if there is a positive linear relationship between elevation and slugging percentage, we test: H 0 : 1  0 H a : 1  0 From the printout, the test statistic is t = 2.89 and the p-value is p = 0.008 / 2 = 0.004. Since the p-value is less than   p  0.004  0.01 , H0 is rejected. There is sufficient evidence to indicate there is a positive linear relationship between elevation and slugging percentage at   0.01. c. Using MINITAB, the scatterplot is: Scatterplot of SLUGPCT vs ELEVATION 0.625 0.600 0.575 SLUGPCT 3.35 12 3-17 0.550 0.525 0.500 0.475 0.450 0 1000 2000 3000 4000 5000 6000 ELEVATION Copyright © 2020 Pearson Education, Inc. Simple Linear Regression Denver’s elevation is much greater than all the others. In addition, if the observation for Denver is deleted, there does not appear to be much of a relationship between elevation and slugging percentage. d. Using MINITAB, the results are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 0.001389 0.001389 0.98 0.332 ELEVATION 1 0.001389 0.001389 0.98 0.332 Error 26 0.036922 0.001420 Lack-of-Fit 22 0.036685 0.001667 28.08 0.003 Pure Error 4 0.000238 0.000059 Total 27 0.038311 Coefficients Term Coef SE Coef T-Value P-Value Constant 0.5154 0.0107 48.33 0.000 ELEVATION 0.000020 0.000020 0.99 0.332 VIF 1.00 Regression Equation SLUGPCT = 0.5154 + 0.000020 ELEVATION ˆ0 = 0.5154 ˆ1 = 0.000020 To determine if there is a positive linear relationship between elevation and slugging percentage, we test: H 0 : 1  0 H a : 1  0 From the printout, the test statistic is t = 0.99 and the p-value is p = 0.332 / 2 = 0.166. Since the p-value is not less than   p  0.166  0.01 , H0 is not rejected. There is insufficient evidence to indicate there is a positive linear relationship between elevation and slugging percentage at   0.01. The new plot is: Scatterplot of SLUGPCT vs ELEVATION 0.625 0.600 0.575 SLUGPCT 3-18 0.550 0.525 0.500 0.475 0.450 0 200 400 600 800 1000 1200 ELEVATION Copyright © 2020 Pearson Education, Inc. Chapter 3 3.37 3.38 3.39 3-19 a. Years of education and yearly income b. Number of hours playing video games and GPA a. If r  0.7, , there is a positive linear relationship between x and y. As x increases, y tends to increase. The slope is positive. b. If r  0.7, there is a negative linear relationship between x and y. As x increases, y tends to decrease. The slope is negative. c. If r  0, , there is a 0 slope. There is no linear relationship between x and y. d. If r 2  0.64, then r is either 0.8 or .8. The linear relationship between x and y could be either positive or negative. a. From Exercise 3.6, SS xx = 17.5, SS yy = 14 and SS xy = 15 r SS xy SS xx SS yy 15  17.514  0.9583 From Exercise 3.19, SSE = 1.1435 SS yy – SSE 14 -1.1435 = = 0.9183. r2 = 14 SS yy There is a strong positive correlation between x and y. We can explain 91.83% of the variation in the sample y’s using the linear model with x. b. In Exercise 3.7, SS xx = 10, SS yy = 16 and SS xy = -12 r= SS xy SS xx SS yy = -12 10(16) = -0.9487. In Exercise 3.7, SSE  SS yy  ˆ1SS xy  16  1.2 12  1.6. r2 = SS yy – SSE SS yy = 16 -1.6 = 0.90. 16 There is a strong positive linear correlation between x and y. We can explain 90% of the variation in the sample y’s using the linear model with x. 3.40 We would expect the crime rate to increase as U.S. population increases. Therefore, we expect a positive correlation between the variables. 3.41 We would expect the GPA of a college student to be correlated to his/her I.Q. As the I.Q. score increases, we would expect the GPA to increase. Thus, the correlation would be positive. Copyright © 2020 Pearson Education, Inc. 3-20 Simple Linear Regression 3.42 a. r  0.975. There is a very strong linear relationship between the sale price of a house and the appraised property market value. b. r 2  0.9516. 95.16% of the sample home sale prices is explained by the linear relationship between the appraised value of the house and the final market price. a. r 2  0.18. 18% of the sample number of points scored is explained by the linear relationship between the number of points scored and the number of yards from the opposing goal line. b. r   0.18  0.424. The value of r is negative because the coefficient associated with the number of yards from the opposing goal line in the fitted regression line is negative. a. Since the p-value of 0.33 is greater than   0.05, we cannot conclude that there is a significant linear relationship between cooperation use and average payoff. b. Since the p-value of 0.66 is greater than   0.05, we cannot conclude that there is a significant linear relationship between defection use and average payoff. c. Since the p-value of 0.001 is smaller than   0.05, we can conclude that there is a significant linear relationship between punishment use and average payoff. a. Since the p-value of 0.07 is greater than   0.05, we cannot conclude that there is a significant linear relationship between baseline and follow-up physical activity for obese young adults; fail to reject H 0 :  = 0 at   .05. b. A possible scatterplot of the data would be: 3.44 3.45 Scatterplot of Follow-up vs Baseline 70 60 Follow-up 3.43 50 40 30 20 55.0 57.5 60.0 62.5 65.0 67.5 Baseline 2 c. r 2 = (.50) = 0.25, thus 25% of the variability around the sample mean for the total of follow-up number of movements is explained by the linear relationship between the baseline total number of movements for the obese adults and the follow-up total number of movements for the obese adults. d. Since the correlation value itself is close to zero and the p-value of 0.66 is greater than   0.05, we cannot conclude that there is a significant linear relationship between baseline Copyright © 2020 Pearson Education, Inc. Chapter 3 3-21 and follow-up physical activity for normal weight young adults; fail to reject H 0 :  = 0 at   .05. e. A possible scatterplot is: Scatterplot of Follow-up vs Baseline 60 Follow-up 50 40 30 20 10 10 20 30 40 50 Baseline f. 3.46 2 r 2 = (-.12) = 0.0144. Thus 1.44% of the variability around the sample mean for the total of follow-up number of movements is explained by the linear relationship between the baseline total number of movements for the normal weight young adults and the total of follow-up number of movements for the normal weight young adults. In Exercise 3.13, SS xx = 56, 452.958 and SS xy = -129.94167 2 SS yy   y  r=   y 2 SS xy SS xx SS yy n =  769.72  135.82  1.318333 24 -129.94167 56, 452.958(1.318333) = -0.4763. SSE  SS yy  ˆ1SS xy  1.318333   0.002301769 129.94167   1.01924. r2 = SS yy – SSE SS yy = 1.318333 -1.01924 = 0.2269. 1.318333 22.69% of the variability around the sample mean for the sweetness index can be explained by the linear relationship between the sweetness index and the amount of water-soluble pectin. 3.47 a. There is a rather weak negative linear relationship between the numerical value of a last name and the response time. b. Since the p-value is less than   p  0.018  0.05  , H0 is rejected. There is sufficient evidence to indicate a negative linear relationship between the numerical value of a last name and the response time. c. Yes, the analysis supports the researchers’ last name effect theory. Because the correlation coefficient is negative, as the numerical value of the last name increases, the response time tends to decrease. Copyright © 2020 Pearson Education, Inc. 3-22 Simple Linear Regression 3.48 Using the values computed in Exercise 3.15: SS xy 19.975 = = 0.2333 r= SS xx SS yy 756(9.70031597) Because r is fairly close to 0, there is a very weak positive linear relationship between the proportion of names recalled and position. r 2 = 0.23332 = 0.0544 5.44% of the sample variance of proportion of names recalled around the sample mean is explained by the linear relationship between proportion of names recalled and position. 3.49 a. To determine if the true population correlation coefficient relating NRMSE and bias is positive, we test: H0 :  = 0 Ha :  > 0 The test statistic is t  r 1 r2 n2  0.2838 1  0.28382 3,600  2  17.753. No  value was given, so we will use   0.5. The rejection region requires   0.5 in the upper tail of the t distribution. From Table 2, Appendix D, with df  n  2  3,600  2  3598, t0.05  1.645. The rejection region is t  1.645. Since the observed value of the test statistic falls in the rejection region  t  17.753  1.645  , H0 is rejected. There is sufficient evidence to indicate the true population correlation coefficient relating NRMSE and bias is positive at   0.5. 3.50 b. No, we would not recommend using NRMSE as a linear predictor of bias. The estimated correlation coefficient is r  0.2838. This indicates that there is a rather weak positive linear relationship between NRMSE and bias. The sample size was extremely large. The larger the sample size, the easier it is to find statistical significance. In this case, there is statistical significance, but not practical significance. a. The sample correlation coefficient between PSI and PHI-F is r  0.401. There is a weak positive linear relationship between the perceived sensory intensity and the perceived hedonic intensity for favorite food. The sample correlation coefficient between PSI and PHI-L is r  0.375. There is a weak negative linear relationship between the perceived sensory intensity and the perceived hedonic intensity for least favorite food. b. Yes, we agree that those with the greatest taste intensity tend to experience more extreme food likes and dislikes. As the taste intensity increases, the intensity of favorite foods tends to increase. As the taste intensity increases, the intensity of least favorite foods tends to decrease. Copyright © 2020 Pearson Education, Inc. Chapter 3 3.51 3.52 3-23 a. r 2  0.948. 94.8% of the variability around the mean ln(eye mass) is explained by the linear relationship between ln(eye mass) and ln(body mass). b. From 3.35a, the relationship between ln (eye mass) and ln (body mass) is positive. Therefore, r  0.948  0.974. There is a very strong positive linear relationship between ln (eye mass) and ln (body mass). c. r 2  .375. 37.5% of the variability around the mean ln(orbit axis angle) is explained by the linear relationship between ln(orbit axis angle) and ln(body mass). d. From 3.35b, the relationship between ln(orbit axis angle) and ln(body mass) is negative. Therefore, r   0.375  0.612. There is a moderate negative linear relationship between ln(orbit axis angle) and ln(body mass). a. First, examine the formulas for the confidence interval and the prediction interval. The only difference is that the prediction interval has an extra term (a “1”) beneath the radical. Thus, the prediction interval must be wider:  xp  x 1  n SS xx   1 1   xp  x  2 n 2 SS xx The error in estimating the mean value of y, E  y  , for a given value of x, say xp, is the distance between the least squares line, yˆ  ˆ  ˆ x, and the true line of means, 0 1   E  y    0  1 x. In contrast, the error in predicting some future of y, yˆ  y p is the sum of two errors: the error of estimating the mean of y, E  y  , plus the random error of the actual values of y around its mean. Consequently, the error of predicting a particular value of y will be larger than the error of estimating the mean value of y for a particular value of x. b.  x p  x  , the further x is from x , the larger the Since the standard error contains the term a. ˆ1 = 2 3.53 p SS xx standard error. This causes the confidence intervals to be wider for values of xp further from x . The implication is our best confidence intervals (narrowest) will be found when x p  x . SS xy SS xx = 16.22 = 3.400 4.77 SSE = SS yy – ˆ1SS xy = 59.21 – 3.4(16.22) = 4.062 s2 = b. SSE 4.062 = = 0.226. n – 2 20 – 2 For x = 2.5, yˆ = 2.1 + 3.4 (2.5) = 10.6 Copyright © 2020 Pearson Education, Inc. 3-24 Simple Linear Regression 2 1 (x – x ) + . The form of the 95% confidence interval is yˆ  t /2 s n SS xx For confidence coefficient 0.95,  = 0.05 and  / 2 = 0.05 / 2 = 0.025. From Table 2, Appendix D, with df = n – 2 = 20 – 2 = 18, t0.025 = 2.101. The 95% confidence interval is: 1 2.5  2.5 10.6  2.101 0.226   10.6  0.223  10.377,10.823 20 4.77 2 We are 95% confident the mean value of y when x = 2.5 is between 10.377 and 10.823. c. For x = 2.0, yˆ = 2.1 + 3.4(2.0) = 8.9. The 95% confidence interval is: 1 2.0  2.5 8.9  2.101 0.226   8.9  0.320  8.580, 9.220 20 4.77 2 We are 95% confident the mean value of y when x = 2.0 is between 8.580 and 9.220. d. For x = 3.0, yˆ = 2.1 + 3.4(3.0) = 12.3. The 95% confidence interval is: 1 3.0  2.5   12.3  0.320  11.980,12.620 20 4.77 2 12.3  2.101 0.226 We are 95% confident the mean value of y when x = 3.0 is between 11.980 and 12.620. e. The width of the interval in (b) is10.823  10.377  0.446. The width of the interval in (c) is 9.220  8.580  0.640. The width of the interval in (d) is 12.620  11.980  0.640. As the value of x moves away from x  2.5, the confidence interval gets wider. f. The 95% prediction interval is yˆ  t /2 s 1 + 1 ( x – x )2 . + n SS xx 2 1 (3.0 – 2.5)  12.3  1.049  (11.251,13.349). 12.3  2.101 0.226 1 + + 20 4.77 We are 95% confident that the actual value of y will be between 11.251 and 13.349 when the value of x is 3. Copyright © 2020 Pearson Education, Inc. Chapter 3 3.54 3.55 a. No. We know there is a significant linear relationship between sale price and appraised value. However, the actual sale prices may be scattered quite far from the predicted line. b. From the printout, the 95% prediction interval for the actual sale price when the appraised value is $300,000 is  285.938, 561.741 or  $285,938, $561,741 . We are 95% confident that the actual sale price for a home appraised at $300,000 is between $285,938 and $561,741. c. From the printout, the 95% confidence interval for the mean sale price when the appraised value is $300,000 is  408.119, 439.560  or  $408,119, $439,560  . We are 95% confident that the mean sale price for a home appraised at $300,000 is between $408,119 and $439,560. a. Researchers should use a prediction interval for y with  xp  x 1 x = 10  yˆ  t /2 s 1   n SS xx b.   yˆ  t s 1  1  10  x  . 2 2  /2 n SS xx Researchers should use a confidence interval for the mean value of y or E  y  , with  xp  x 1 x  10  yˆ  t /2 s  n SS xx 3.56 3-25   yˆ  t s 1  10  x  . 2 2  /2 n SS xx a. We are 95% confident that the actual value of the angular size of the Moon is between 323.502 and 326.108 when the height above the horizon is 50 degrees. b. We are 95% confident that the mean value of the angular size of the Moon is between 324.448 and 325.163 when the height above the horizon is 50 degrees. c. No, we would not recommend using the least squares line to predict the angular size of the Moon for a height of 80 degrees because 80 degrees is outside the observed range of data used to construct the least squares line. 3.57 For x = 300, the confidence interval for E ( y ) is (5.45812, 5.65964). We are 90% confident that the mean sweetness index is between 5.458 and 5.660 when the amount of pectin is 300. 3.58 a. From Exercises 3.15 and 3.33, x  5.5, SS xx  756, s  0.25415, and yˆ  0.5704  0.0264 x. For x  5, yˆ  0.5704  0.0264  5   0.7024. For confidence coefficient 0.99,   0.01 and  / 2  0.01 / 2  0.005. From Table 2, Appendix D, with df  n  2  144  2  142, t0.005  2.576. The 99% confidence interval is:  xp  x 1 yˆ  t /2 s  n SS xx   0.7024  2.576 0.2542 2 5  5.5 1  144 756 Copyright © 2020 Pearson Education, Inc. 2 3-26 Simple Linear Regression  0.7024  0.0559  0.6465, 0.7583 We are 99% confident that the mean recall of all those in the 5th position is between 0.6465 and 0.7583. b. For confidence coefficient 0.99,   0.01 and  / 2  0.01 / 2  0.005. From Table 2, Appendix D, with df  n  2  144  2  142, t0.005  2.576. The 99% prediction interval is:  xp  x 1 yˆ  t /2 s 1   n SS xx   0.7024  2.576 0.2542 1  1  5  5.5 2 144 2 756  0.7024  0.6572  0.0452, 1.3596 We are 99% confident that the actual recall of a person in the 5th position is between 0.0452 and 1.3596. Since the proportion of names recalled cannot be larger than 1, the actual proportion recalled will be between 0.0452 and 1.000. 3.59 c. The prediction interval in part b is wider than the confidence interval in part a. The prediction interval will always be wider than the confidence interval. The confidence interval for the mean is an interval for predicting the mean of all observations for a particular value of x. The prediction interval is a confidence interval for the actual value of the dependent variable for a particular value of x. a. From Exercises 3.16 and 3.34, x = 22.87, SS xx  6906.608, s = 0.8573, and yˆ = 5.22 – 0.114 x. For x = 15, yˆ = 5.22 – 0.114(15) = 3.51. For confidence coefficient 0.90,  = 0.10 and  = 0.10 / 2 = 0.05. From Table 2, Appendix D, with df  n  2  23  2  21, t0.05  1.721. The 90% confidence interval is:  xp  x 1 yˆ  t /2 s  n SS xx   3.51  1.7210.8573 1  15  22.87  2 2 23 6906.608 3.51  0.34  3.17, 3.85. We are 90% confident that the mean mass of all spills with an elapsed time of 15 minutes is between 3.17 and 3.85. b. For confidence coefficient 0.90,  = 0.10 and  / 2 = 0.10 / 2 = 0.05. From Table 2, Appendix D, with df  n  2  23  2  21, t0.05  1.721. The 90% prediction interval is:    2 2 xp  x 1 1 15  22.87  ˆy  t /2 s 1    3.51  1.7210.8573 1    n SS xx 23 6906.608 3.51  1.514  2.00, 5.02. We are 90% confident that the mass of a single spill with an elapsed time of 15 minutes is between 2.00 and 5.02. Copyright © 2020 Pearson Education, Inc. Chapter 3 3.60 a. 3-27 To determine if the model is adequate for predicting nitrogen amount, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t = 32.80 and the p-value is p < 0.0001. Since the p-value is so small ( p 1.980), H0 is rejected. There is sufficient evidence to there is a linear relationship between the monthly price of recycled colored plastic bottles and the monthly price of naphtha at  = 0.05. r 2 = 0.69 69% of the sample variation around the mean monthly price of recycled colored plastic bottles is explained by the linear relationship between the monthly price of recycled colored plastic bottles and the monthly price of naphtha. Copyright © 2020 Pearson Education, Inc. Chapter 3 3.63 a. 3-29 Using MINITAB, the results are: Regression Analysis: Corrupt versus GDP Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 3345.8 3345.76 45.33 0.000 Error 11 811.9 73.81 Total 12 4157.7 Model Summary S R-sq R-sq(adj) 8.59141 80.47% 78.70% Coefficients Term Coef SE Coef T-Value P-Value Constant 25.89 3.09 8.37 0.000 0.000985 0.000146 6.73 0.000 GDP Regression Equation Corrupt = 25.89 + 0.000985 GDP The fitted regression line is yˆ = 25.89 + 0.000985GPD. To determine if GDP per capita is a linear predictor of corruption level, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t = 6.73 and the p-value is p = 0.000. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate GDP per capita is a linear predictor of corruption level for any reasonable value of  . r 2  0.8047 This indicates that 80.47% of the variability in the corruption values is explained by the linear relationship between the corruption values and the GDP per capita. b. Using MINITAB, the results are: Regression Analysis: Corrupt versus PolR Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 2528 2527.6 17.06 0.002 Error 11 1630 148.2 Total 12 4158 Model Summary S R-sq R-sq(adj) 12.1732 60.79% 57.23% Copyright © 2020 Pearson Education, Inc. 3-30 Simple Linear Regression Coefficients Term Coef SE Coef T-Value P-Value Constant 66.06 7.34 9.00 0.000 PolR -6.25 1.51 -4.13 0.002 Regression Equation Corrupt = 66.06 – 6.25 PolR The fitted regression line is yˆ = 66.06 – 6.25PolR. To determine if degree of freedom in political rights is a linear predictor of corruption level, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t = -4.13 and the p-value is p = 0.002. Since the p-value is so small, H0 is rejected. There is sufficient evidence to indicate GDP per capita is a linear predictor of corruption level for any value of  > 0.002. r 2  0.6079 This indicates that 60.79% of the variability in the corruption values is explained by the linear relationship between the corruption values and the degree of freedom in political rights. c. Using MINITAB, a scatterplot of the data is: Scatterplot of MTBE vs pH 50 40 30 MTBE 3.64 Both variables, GDP per capita and degree of freedom in political rights, are significant predictors of corruption levels. Of the two, GDP per capita is a better predictor because the r 2 value is larger and the p-value for the test is smaller. 20 10 0 5 6 7 8 9 10 pH From the plot, there does not look like there is a linear relationship between MTBE and pH level. The proposed linear regression model is y   0  1 x   . Using MINITAB, an analysis of the data is: Copyright © 2020 Pearson Education, Inc. Chapter 3 3-31 Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 2.01 2.008 0.08 0.782 Error 221 5785.93 26.181 Total 222 5787.94 Model Summary S R-sq R-sq(adj) R-sq(pred) 5.11670 0.03% 0.00% 0.00% Coefficients Term Coef SE Coef T-Value P-Value Constant 0.35 3.14 0.11 0.911 pH 0.116 0.420 0.28 0.782 VIF 1.00 The parameter estimates of the least squares line are: ˆ0  0.35 The least squares line is yˆ  0.35  0.116 x. ˆ1  0.116 The least squares estimate of the slope, ˆ1  0.116, implies that the estimated MTBE increases by 0.116 for each additional unit increase in the pH level. This interpretation is valid only over the observed values of the pH level which is from 5.28 to 9.48. The estimated y-intercept, ˆ0  0.35 has no practical meaning in this example because 0 will not be within the observed range of the pH levels. The estimate of  is s  5.1167 . The value of this estimate is very large compared to most of the values of MTBE. To determine if there is a linear relationship between the MTBE and the pH level, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t = 0.28 and the p-value is p = 0.782. Since the p-value is so large, H0 will not be rejected for any reasonable value of  . There is insufficient evidence to indicate there is a linear relationship between the MTBE and the pH level. r 2  0.00 This indicates that 0% of the variability in the MTBE values is explained by the linear relationship between the MTBE values and the pH levels. This would indicate that a linear regression model does not explain the relationship between MTBE and pH. Copyright © 2020 Pearson Education, Inc. 3-32 Simple Linear Regression 3.65 Using MINITAB, a scatter plot of the data is: Scatterplot of HEATRATE vs RPM 17000 16000 15000 HEATRATE 14000 13000 12000 11000 10000 9000 8000 0 5000 10000 15000 20000 25000 30000 35000 RPM From the plot, there is evidence to indicate a linear relationship between heat rate and speed. The proposed linear regression model is y   0  1 x   . Using MINITAB, an analysis of the data is: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 119598530 119598530 160.95 0.000 RPM 1 119598530 119598530 160.95 0.000 Error 65 48298678 743057 Lack-of-Fit 28 28773369 1027620 1.95 0.029 Pure Error 37 19525309 527711 Total 66 167897208 Model Summary S R-sq R-sq(adj) R-sq(pred) 862.007 71.23% 70.79% 69.63% Coefficients Term Coef SE Coef T-Value P-Value Constant 9470 164 57.73 0.000 0.1917 0.0151 12.69 0.000 RPM VIF 1.00 Regression Equation HEATRATE = 9470 + 0.1917 RPM The parameter estimates of the least squares line are: ˆ0  9470 The least squares line is yˆ  9470  0.1917 x. ˆ1  0.1917 The least squares estimate of the slope, ˆ1  0.1917, implies that the estimated heat rate increases by 0.1917 units for each additional unit increase in the speed. This interpretation is valid only over the observed values of the speed level which is from 3,000 to 33,000. The estimated y-intercept, ˆ0  9470 has no practical meaning in this example because 0 will not be within the observed range of the speed levels. Copyright © 2020 Pearson Education, Inc. Chapter 3 3-33 The estimate of  is s  862.007 . We expect most of the observations to fall within 2s  2  862.007   1724.014 units of their predicted values. To determine if there is a linear relationship between the heat rate and the speed, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t = 12.69 and the p-value is p = 0.000. Since the p-value is so small, H0 will be rejected for any reasonable value of  . There is sufficient evidence to indicate there is a linear relationship between the heat rate and speed. r 2  0.7173 This indicates that 71.73% of the variability in the heat rate values is explained by the linear relationship between heat rate and the speed. This indicates that a linear regression line models the relationship between heat rate and speed fairly well. Using MINITAB, a scatterplot of the data is: Scatterplot of ACCURACY vs DISTANCE 75 70 65 ACCURACY 3.66 60 55 50 45 280 290 300 310 320 DISTANCE From the plot, there is evidence to indicate a linear relationship between accuracy and distance. The proposed linear regression model is y   0  1 x   . Using MINITAB, an analysis of the data is: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 874.99 874.989 174.95 0.000 DISTANCE 1 874.99 874.989 174.95 0.000 Error 38 190.06 5.001 Lack-of-Fit 36 176.55 4.904 0.73 0.735 Pure Error 2 13.51 6.753 Total 39 1065.04 Model Summary S R-sq R-sq(adj) R-sq(pred) 2.23639 82.16% 81.69% 79.26% Copyright © 2020 Pearson Education, Inc. 3-34 Simple Linear Regression Coefficients Term Coef SE Coef T-Value P-Value Constant 250.1 14.2 17.58 0.000 DISTANCE -0.6294 0.0476 -13.23 0.000 VIF 1.00 Regression Equation ACCURACY = 250.1 – 0.6294 DISTANCE The parameter estimates of the least squares line are: ˆ0  250.1 The least squares line is yˆ  250.1  0.6294 x. ˆ1  0.6294 The least squares estimate of the slope, ˆ1  0.6294, implies that the estimated accuracy decreases by 0.6294 units for each additional yard increase in distance. This interpretation is valid only over the observed values of distance which is from 293.2 to 318.9 yards. The estimated y-intercept, ˆ0  250.1 has no practical meaning in this example because 0 will not be within the observed range of distances. The estimate of  is s  2.23639 . We expect most of the observations to fall within 2s  2  2.23639   4.473 units of their predicted values. To determine if there is a negative linear relationship between accuracy and distance, we test: H 0 : 1 = 0 H a : 1 0 The test statistic is t  s ˆ1 x  2 3.158333  43.193 1.1328 240 The rejection region requires   0.05 in the upper tail of the t distribution. From Table 2, Appendix D, with df  n  1  8  1  7, t0.05  1.895. The rejection region is t  1.895. Since the observed value of the test statistic falls in the rejection region  t  43.193  1.895  , H0 is rejected. There is sufficient evidence to indicate that x and y are positively linearly related at   0.05. d.  s  . The form of the confidence interval for 1 is ˆ1  t0.025    x2    Copyright © 2020 Pearson Education, Inc. Chapter 3 3-37 For confidence coefficient 0.95,   0.05 and  / 2  0.05 / 2  0.025. From Table 2, Appendix D, with df  n  1  8  1  7, t0.025  2.365. The 95% confidence interval is:     3.158  2.365  1.1328   3.158  0.173   2.985, 3.331   x2   240    ˆ1  t0.025  e. s The point estimate for y when x  7 is yˆ  3.158  7   22.106. The 95% confidence interval for E  y  is:  x2   72  p     22.106  1.211   20.895, 23.317    yˆ  t0.025 s  22.106 2.365 1.1328     x 2   240      f. The 95% prediction interval for y is:  2   x 2p   1 7  22.106 2.365 1.1328 yˆ  t0.025 s  1      2   240   x       22.106  2.940  19.166, 25.046  3.69 a. The results of the preliminary calculations are provided below: n = 5, å x 2 = 30, å xy = -278, å y 2 = 2589 Substituting into the formula for ˆ1 , we have ˆ1 = å xy åx 2 = -278 = -9.2667 and the least 30 squares line is yˆ = -9.2667 x. b. SSE = å y 2 – ˆ1 å xy = 2589 – (-9.26666677)(-278) = 12.8667 s2 = c. SSE 12.8667 = = 3.2167 s = s 2 = 3.2167 = 1.7935 n -1 5 -1 To determine if x and y are negatively linearly related, we test: H 0 : 1 = 0 H a : 1 0 The test statistic is t = ˆ1 = s åx 2 0.2085 = 52.29. 1.5869 158, 400 The rejection region requires   0.05 in the upper tail of the t distribution. From Table 2, Appendix D, with df  n  1  10  1  9, t0.05  1.833. The rejection region is t  1.833. Since the observed value of the test statistic falls in the rejection region  t  52.29  1.833 , H0 is rejected. There is sufficient evidence to indicate that x and y are positively linearly related at   0.05. d.  s  . The form of the confidence interval for 1 is ˆ1  t0.025    x2    For confidence coefficient 0.95,   0.05 and  / 2  0.05 / 2  0.025. From Table 2, Appendix D, with df  n  1  10  1  9, t0.025  2.262. The 95% confidence interval is: æ ç ö÷ æ 1.5869 ö÷ ÷÷  0.2085  0.0090  (0.1995, 0.2175). ÷÷÷  0.2085  2.262ççç çè å x 2 ÷÷ø èç 158, 400 ø÷ ˆ1  t0.025 ççç e. s The point estimate for y when x  125 is yˆ  ˆ1 x  0.2085 125   26.06. The 95% confidence interval for E  y  is: æ ö æ 2 ö÷ çç x 2p ÷÷ ÷÷  26.06  2.262(1.5869)ççç 125 ÷÷  26.06  1.13 yˆ  t0.025 s çç ÷ 2÷ ç çè 158, 400 ÷÷ø çèç å x ÷ø÷  (24.93, 27.19). f. The 95% prediction interval for y is:  2   x 2p   1  125    26.06 2.262 1.5869 yˆ  t0.025 s  1    2   158, 400   x       26.06  3.76   22.30, 29.82  3.71 a. Some preliminary calculations are: n=8 2 å x = 59.75 å xy = 320.5 2 å y = 1738 Copyright © 2020 Pearson Education, Inc. 3-40 Simple Linear Regression å xy 320.5 = = 5.364016736 » 5.364, and the least squares line is yˆ = 5.364 x. Then, ˆ1 = 2 59.75 åx b. To determine if there is a linear relationship between drug dosage and decrease in pulse rate, we test: H 0 : 1  0 H a : 1  0 ˆ1 The test statistic is t = s åx where s = s 2 = 2 2 1738 – (5.364)(320.5) SSE å y – ˆ1 å xy = = = 1.640 8 -1 n -1 n -1 Substituting, we have t = 5.364 = 25.28. 1.640 59.75 The rejection region requires  / 2  0.10 / 2  0.05 in each tail of the t distribution. From Table 2, Appendix D, with df  n  1  8  1  7, t0.05  1.895. The rejection region is t  1.895 or t  1.895. Since the observed value of the test statistic falls in the rejection region  t  25.28  1.895  , H0 is rejected. There is sufficient evidence to indicate that drug dosage and decrease in pulse rate are linearly related at   0.10. c. We want to predict the decrease in pulse rate y corresponding to a drug dosage of x p = 3.5 cubic centimeters. First, we obtain the point estimate: yˆ = ˆ1 x = 5.364(3.5) = 18.774 For confidence coefficient 0.99,   0.01 and  / 2  0.01 / 2  0.005. From Table 2, Appendix D, with df  n  1  8  1  7, t0.005  3.499. The 99% confidence interval is: yˆ  t0.005 s 1 + x 2p åx  18.774  3.499 (1.640) 1 + 2 (3.5)2  18.774  6.299 59.75  (12.475, 25.073). Therefore, we predict the decrease in pulse rate corresponding to a dosage of 3.5cc to fall between 12.475 and 25.073 beats/minute with 99% confidence. Copyright © 2020 Pearson Education, Inc. Chapter 3 Some preliminary calculations are: 2 å y = 3,571, 211, 200 å xy = 76,652,695 2 å x = 4305 å x = 1,652,025 å y = 201,558 a. ˆ1 = å xy 2 = 76,652,695 = 46.39923427 » 46.3992, and the least squares line is 1,652,025 åx yˆ = 46.3992 x. Using MINITAB, the scatterplot of the data with the fitted line is: 30000 yhat=46.4x 25000 20000 WEIGHT 3.72 3-41 yhat=478.4+45.15x 15000 10000 5000 0 0 100 200 300 400 500 600 BAGS b. 2 SSxx   x    x 2  43052  1,652,025   416, 490 n 15   x   y   4305 201,558 SS xy   xy   76,652,695   18,805,549 n 15 ˆ1  SS xy SS xx  18,805,549  45.15246224  45.152 416, 490 ˆ0  y  ˆ1 x  201,558  4305   45.1524622    478.443 15  15  The fitted line is yˆ  478.443  45.152 x. c. Since 0 is not contained in the observed range of values of the number of 50-pound bags in the shipment, ̂ 0 has no practical interpretation. Therefore, a value of ̂ 0 that differs from 0 is not unexpected. d. First, we need to compute s. SS yy   y 2    y 2 n  3,571, 211, 200   201,558 2 15  862,836,042 SSE  SS yy  ˆ1SS xy  862,836,042  45.15246224 18,805,549   13,719, 200.9 Copyright © 2020 Pearson Education, Inc. 3-42 Simple Linear Regression s2  SSE 13,719, 200.9   1,055,323.146 n2 15  2 s  1,055,323.146  1027.2892 To determine if  0 should be included in the model, we test: H0 : 0  0 H a : 0  0 The test statistic is t  ˆ0 1 x2 s  n SS xx  478.4 1 287 2 1027.289  15 416, 490  0.906. The rejection region requires  / 2  0.10 / 2  0.05 in each tail of the t distribution. From Table 2, Appendix D, with df  n  2  15  2  13, t0.05  1.771. The rejection region is t  1.771 or t  1.771. Since the observed value of the test statistic does not fall in the rejection region  t  0.906  1.771 , H0 is not rejected. There is insufficient evidence to indicate that  0 should be included in the model at   0.10. 3.73 a. Some preliminary calculations are: n = 10 2 å x = 1,933,154 å xy = 98,946,257 2 å y =5,066,358,119 å xy 98,946, 257 Then, ˆ1 = = = 51.18384619 » 51.184, and the least squares prediction 2 1,933,154 åx equation is yˆ = 51.184x. b. To determine if population contributes to the prediction of electricity customers, we test: H 0 : 1  0 H a : 1  0 The test statistic is t = ˆ1 s åx 2 ( ) 2 å y – ˆ1 å xy SSE = where s = s = n -1 n -1 5,066,358,119 – 51.18385(98,946, 257) = = 460.4036 10 -1 2 Copyright © 2020 Pearson Education, Inc. Chapter 3 Substituting, we have t = 3-43 51.18 = 154.56 460.4036 / 1,933,154 The rejection region requires  / 2  0.01 / 2  0.005 in each tail of the t distribution. From Table 2, Appendix D, with df  n  1  10  1  9, t0.005  3.250. The rejection region is t  3.250 or t  3.250. Since the observed value of the test statistic falls in the rejection region  t  154.56  3.250  , H0 is rejected. There is sufficient evidence to indicate that population contributes to the prediction of electricity customers at   0.01. c. We need the following additional information: å x = 4286 å y = 220, 297 ˆ1 = 47.07 SS xx = 96,174.4 SS xy = 4,526,962.8 SS yy = 213,281,298 2 ˆ0 = 1855.35 SSE = 195,568.4 s = 24,446.05 s = 156.3523 The least squares prediction equation is yˆ = 1855.35 + 47.07x. To determine if population contributes to the prediction of electricity customers, we test: H 0 : 1  0 H a : 1  0 The test statistic is t = ˆ1 s / SS xx = 47.07 156.3523 / 96,174.4 = 93.36 The rejection region requires  / 2  0.01 / 2  0.005 in each tail of the t distribution. From Table 2, Appendix D, with df  n  2  10  2  8, t0.005  3.355. The rejection region is t  3.355 or t  3.355. Since the observed value of the test statistic falls in the rejection region  t  93.36  3.355  , H0 is rejected. There is sufficient evidence to indicate that population contributes to the prediction of electricity customers at   0.01. d. Without running a formal test, we can compare the two models. The value of s for the model y = 1 x +  is 460.4036 while the value of s for the model y =  0 + 1 x +  is 156.3523. Since the value of s is much smaller for the second model, it appears that the second model should be used. For a formal test, refer to part (d) of Exercise 3.66. H0 : 0  0 H a : 0  0 Copyright © 2020 Pearson Education, Inc. 3-44 Simple Linear Regression The test statistic is t = ˆ0 – 0 x2 1 s + n SS xx = 1855.35 1 428.62 156.3523 + 10 96,174.4 = 8.37 The rejection region requires  / 2  0.01 / 2  0.005 in each tail of the t distribution. From Table 2, Appendix D, with df  n  1  10  1  9, t0.005  3.250. The rejection region is t  3.250 or t  3.250. Since the observed value of the test statistic falls in the rejection region  t  8.37  3.250  , H0 is rejected. There is sufficient evidence to indicate that  0 should be included in the model at   0.01. a. Using MINITAB, the scatterplot is: Fitted Line Plot LOS = 3.306 + 0.01475 FACTORS 16 S R-Sq R-Sq(adj) 14 2.10077 37.4% 36.1% 12 10 LOS 3.74 8 6 4 2 0 100 200 300 400 500 FACTORS b. From the printout, the least squares line is yˆ  3.306  0.01475 x. c. For every one unit increase in the number of factors per patient, we estimate the patient’s length of stay to increase 0.01475 days. d. To determine if the number of factors per patient contributes information for the prediction of the patient’s length of stay, we test: H 0 : 1  0 H a : 1  0 The test statistic is t  5.36 and the p-value is p  0.0001. Since the p-value is less than   p  0.0001  0.05  , H0 is rejected. There is sufficient evidence to indicate that the number of factors per patient contributes information for the prediction of the patient’s length of stay at   0.05. Copyright © 2020 Pearson Education, Inc. Chapter 3 3.75 3.76 3-45 e. From the printout, the 95% confidence interval is (0.00922, 0.02029). We are 95% confident that for each additional factor per patient, the patient’s length of stay will increase between 0.00917 and 0.02033 days. f. r  0.3740  0.6116 There appears to be a moderate positive linear relationship between the number of factors and the length of stay. g. r 2  0.3740 37.4% of the variability around the mean length of stay can be explained by the linear relationship between the number of factors and the length of stay. h. From the printout, the 95% prediction interval is  2.44798, 10.98081 . i. There is a significant linear relationship between length of stay and the number of factors. However, the value of r 2 is only r 2  0.3740. Thus, only a little over a third of the variability in the lengths of stays is explained by the model. Many other variables could be affecting the lengths of stay other than the number of factors. a. y =  0 + 1 x +  b. A value of r  0.68 indicates a moderate positive linear relationship between RMP and SET ratings. c. The slope is positive since the correlation coefficient is positive. d. Since the p-value is so small  p  0.001 , H0 is rejected for any value of   0.001. This indicates that there is a significant correlation between RMP and SET ratings. e. r 2 = (0.68) = 0.4624 46.24% of the variability of the sample SET ratings about their mean can be explained by the linear relationship between the SET ratings and the RMP ratings. a. Yes. For the men, as the year increases, the winning time tends to decrease. The straight-line model is y   0  1x   . We would expect the slope to be negative. b. Yes. For the women, as the year increases, the winning time tends to decrease. The straight-line model is y   0  1 x   . We would expect the slope to be negative. c. Since the slope of the women’s line is steeper than that for the men, the slope of the women’s line will be greater in absolute value. d. No. The gathered data is from 1880 to 2000. Using this data to predict the time for the year 2020 would be very risky. We have no idea what the relationship between time and year will be outside the observed range. Thus, we would not recommend using this model. 2 Copyright © 2020 Pearson Education, Inc. 3-46 Simple Linear Regression 3.77 Using MINITAB, the analyses are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 72.04 72.04 7.11 0.056 DIAMETER 1 72.04 72.04 7.11 0.056 Error 4 40.55 10.14 Total 5 112.59 Model Summary S R-sq R-sq(adj) R-sq(pred) 3.18403 63.98% 54.98% 0.00% Coefficients Term Coef SE Coef T-Value P-Value Constant 6.35 3.90 (-1.97, 14.68) 90% CI 1.63 0.179 DIAMETER 0.950 0.356 (0.190, 1.709) 2.67 0.056 VIF 1.00 Regression Equation POROSITY = 6.35 + 0.950 DIAMETER Settings Variable Setting DIAMETER 10 Prediction 3.78 Fit SE Fit 90% CI 90% PI 15.8501 1.30529 (13.0674, 18.6327) (8.51395, 23.1862) a. The least squares line is yˆ  6.35  0.950 x. b. ˆ0  6.35 Since 0 is not in the range of observed values for diameter, ˆ0 has no meaning. c. From the printout the 90% confidence interval is  0.190, 1.709  . We are 90% confident that for each unit increase in diameter, the mean porosity will increase from 0.190 and 1.709 units. d. From the printout, the 90% prediction interval is  8.514, 23.186  . Using MINITAB, the analyses are: Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 0.2330 39.37% 0.2330 0.23300 9.09 0.009 EMPATHY 1 0.2330 39.37% 0.2330 0.23300 9.09 0.009 Error 14 0.3588 60.63% 0.3588 0.02563 Lack-of-Fit 10 0.2557 43.20% 0.2557 0.02557 0.99 0.552 Pure Error 4 0.1031 17.42% 0.1031 0.02578 Total 15 0.5918 100.00% Copyright © 2020 Pearson Education, Inc. Chapter 3 3-47 Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.160084 39.37% 35.04% 0.484291 18.16% Coefficients Term Coef SE Coef 95% CI T-Value P-Value Constant -0.392 0.220 (-0.864, 0.079) -1.79 0.096 EMPATHY 0.0362 0.0120 (0.0104, 0.0619) 3.02 0.009 VIF 1.00 Regression Equation ACTIVITY = -0.392 + 0.0362 EMPATHY To determine if people scoring higher in empathy show higher pain-related brain activity, we test: H 0 : 1  0 H a : 1  0 The test statistic is t  3.02 and the p-value is p  0.009 / 2  0.0045. Since the p-value is very small, H0 is rejected for any value of   0.0045. There is sufficient evidence to indicate that people scoring higher in empathy show higher pain-related brain activity at   0.0045. 3.79 a. Since the p-value for the SG score is p  0.739 and is larger than the significance level of 0.05, then we cannot conclude that ESLR score is linearly related to the SG score. b. Since the p-value for the SR score is p  0.012 and is smaller than the significance level of 0.05, then we can conclude that ESLR score is linearly related to the SR score. c. Since the p-value for the ER score is p  0.022 and is smaller than the significance level of 0.05, then we can conclude that ESLR score is linearly related to ER score. d. 100 r 2 % of the sample variation in ESLR score can be explained by the linear relationship ( ) between ESLR and x (SG, SR, or ER score) a. 0.2% of the sample variation in ESLR scores around their means can be explained by the linear relationship between ESLR and SG scores. b. 9.9% of the sample variation in ESLR scores around their means can be explained by the linear relationship between ESLR and SR scores. c. 7.8% of the sample variation in ESLR scores around their means can be explained by the linear relationship between ESLR and ER scores. 3.80 a. Using MINITAB, the results of the analyses regressing the blood plasma level of 2,3,7,8-TCDD on the fat tissue level of 2,3,7,8-TCDD are: Copyright © 2020 Pearson Education, Inc. 3-48 Simple Linear Regression Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 1105.19 1105.19 132.05 0.000 FAT 1 1105.19 1105.19 132.05 0.000 Error 18 150.65 8.37 Lack-of-Fit 15 137.85 9.19 2.15 0.289 Pure Error 3 12.81 4.27 Total 19 1255.84 Model Summary S R-sq R-sq(adj) R-sq(pred) 2.89303 88.00% 87.34% 80.90% Coefficients Term Coef SE Coef T-Value P-Value Constant -0.150 0.841 -0.18 0.860 FAT 0.9009 0.0784 11.49 0.000 VIF 1.00 Regression Equation PLASMA = -0.150 + 0.9009 FAT The fitted prediction equation is yˆ  0.150  0.9009 x. Using MINITAB, the results of the analyses regressing the fat tissue level of 2,3,7,8-TCDD on the blood plasma level of 2,3,7,8-TCDD are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 1198.32 1198.32 132.05 0.000 PLASMA 1 1198.32 1198.32 132.05 0.000 Error 18 163.35 9.07 Lack-of-Fit 15 154.56 10.30 3.52 0.164 Pure Error 3 8.79 2.93 Total 19 1361.67 Model Summary S R-sq R-sq(adj) R-sq(pred) 3.01245 88.00% 87.34% 80.90% Coefficients Term Coef SE Coef T-Value P-Value Constant 0.970 0.846 1.15 0.267 PLASMA 0.9768 0.0850 11.49 0.000 VIF 1.00 Regression Equation FAT = 0.970 + 0.9768 PLASMA The fitted prediction equation is yˆ  0.970  0.9768 x. b. To determine if fat tissue level is a useful predictor of blood plasma level, we test: Copyright © 2020 Pearson Education, Inc. Chapter 3 3-49 H 0 : 1  0 H a : 1  0 The test statistic is t  11.49 and the p-value is p  0.000. Since the p-value is less than   p  0.000  0.05  , H0 is rejected. There is sufficient evidence to indicate fat tissue level is a useful predictor of blood plasma level at   0.05. c. To determine if blood plasma level is a useful predictor of fat tissue level, we test: H 0 : 1  0 H a : 1  0 The test statistic is t  11.49 and the p-value is p  0.000. Since the p-value is less than   p  0.000  0.05  , H0 is rejected. There is sufficient evidence to indicate blood plasma level is a useful predictor of fat tissue level at   0.05. d. Using MINITAB, the analyses of the data are: Fitted Line Plot STRIKES = 175.7 – 0.8195 AGE 90 S R-Sq R-Sq(adj) 80 15.4349 62.8% 57.4% 70 60 STRIKES 3.81 If we fit a least squares line through the data, the relationship will be the same regardless of which variable is the dependent variable and which variable is the independent variable. The correlation coefficient and the coefficient of determination will be the same regardless of which variable is the dependent variable and which variable is the independent variable. 50 40 30 20 10 120 130 140 150 160 170 180 190 AGE Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 2810 62.76% 2810 2809.9 11.79 0.011 AGE 1 2810 62.76% 2810 2809.9 11.79 0.011 Error 7 1668 37.24% 1668 238.2 Total 8 4478 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 15.4349 62.76% 57.43% 2582.04 42.33% Coefficients Term Coef SE Coef T-Value P-Value Constant 175.7 38.6 (84.4, 267.0) 95% CI 4.55 0.003 AGE -0.819 0.239 (-1.384, -0.255) -3.43 0.011 VIF 1.00 Copyright © 2020 Pearson Education, Inc. 3-50 Simple Linear Regression Regression Equation STRIKES = 175.7 – 0.819 AGE a. The fitted regression line is yˆ  175.7  0.819 x. b. We see from the plot that there appears to be a moderate negative linear relationship between age and the mean number of strikes. ˆ0  175.7 Since 0 is not in the observed range of values of age, ˆ0 has no meaning. ˆ1  0.819 For each additional day of age for the fish, we estimate that the mean number of strikes will decrease by 0.819 strikes. To determine if there is a linear relationship between age of fish and number of strikes, we test: H 0 : 1  0 H a : 1  0 The test statistic is t  3.43 and the p-value is p  0.011. Since the p-value is less than   p  0.011  0.05  , H0 is rejected. There is sufficient evidence to indicate there is a linear relationship between age of fish and number of strikes at   0.05. r 2  0.6276 62.76% of the variability of the mean number of strikes about their mean is explained by the linear relationship between age and number of strikes. Using MINITAB, a scatterplot of the data is: Fitted Line Plot TIME = 4.790 + 0.01439 DEPTH 13 S R-Sq R-Sq(adj) 12 1.43219 63.0% 60.5% 11 10 TIME 3.82 9 8 7 6 5 4 0 100 200 300 400 DEPTH There appears to be a linear relationship between the time to drill 5 feet and the depth at which drilling begins. Using MINITAB, the analyses of the data are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 52.38 52.378 25.54 0.000 DEPTH 1 52.38 52.378 25.54 0.000 Error 15 30.77 2.051 Total 16 83.15 Copyright © 2020 Pearson Education, Inc. Chapter 3 3-51 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.43219 63.00% 60.53% 52.23% Coefficients Term Coef SE Coef T-Value P-Value Constant 4.790 0.666 7.19 0.000 DEPTH 0.01439 0.00285 5.05 0.000 VIF 1.00 Regression Equation TIME = 4.790 + 0.01439 DEPTH The fitted regression line is yˆ  4.790  0.01439 x. ˆ0  4.790 We estimate the mean time to drill 5 feet when starting at a depth of 0 feet is 4.79 minutes. ˆ1  0.01439 For each additional foot of depth, we estimate that the mean time to drill 5 feet will increase by 0.0.01439 minutes. To determine if there is a linear relationship between depth and time, we test: H 0 : 1  0 H a : 1  0 The test statistic is t  5.05 and the p-value is p  0.000. Since the p-value is less than   p  0.000  0.05  , H0 is rejected. There is sufficient evidence to indicate there is a linear relationship between depth and time at   0.05. r 2  0.6300 63.00% of the variability of the mean time to drill 5 feet about their mean is explained by the linear relationship between time to drill and depth that drilling starts. 3.83 a. To determine if body plus head rotation and active head movement are positively linearly related, we test: H 0 : 1 = 0 H a : 1 > 0 The test statistic is t = ˆ1 sˆ 1 = 0.88 – 0 = 6.29. 0.14 The rejection region requires  = 0.05 in the upper tail of the t distribution with df = n – 2 = 39 – 2 = 37. From Table 2, Appendix D, t0.05 » 1.687. The rejection region is t > 1.687. Copyright © 2020 Pearson Education, Inc. 3-52 Simple Linear Regression Since the observed value of the test statistic falls in the rejection region (t = 6.29 > 1.687), H0 is rejected. There is sufficient evidence to indicate that the two variables are positively linearly related at  = 0.05. For confidence level 0.90,  = 0.10 and  / 2 = 0.10 / 2 = 0.05. From Table 2, Appendix D, with df = n – 2 = 39 – 2 = 37, t0.05 » 1.687. The confidence interval is: ˆ  t s  0.88  1.687 (0.14)  0.88  0.24  (0.64, 1.12) b. 0.05 ˆ 1 1 We are 90% confident that the true value of 1 is between 0.64 and 1.12. c. 3.84 Because the interval in part b contains the value 1, there is no evidence that the true slope of the line differs from 1. Using MINITAB, the analyses of the data are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 6.096 6.0958 6.74 0.021 RECOVERY 1 6.096 6.0958 6.74 0.021 Error 14 12.654 0.9039 Lack-of-Fit 7 7.474 1.0677 1.44 0.320 Pure Error 7 5.180 0.7400 Total 15 18.750 Model Summary S R-sq R-sq(adj) R-sq(pred) 0.950722 32.51% 27.69% 19.69% Coefficients Term Coef SE Coef T-Value P-Value Constant 2.970 0.790 3.76 0.002 RECOVERY 0.1267 0.0488 2.60 0.021 VIF 1.00 Regression Equation LACTATE = 2.970 + 0.1267 RECOVERY To determine if blood lactate level is linearly related to perceived recovery, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t  2.60 and the p-value is p  0.021. Since the p-value is less than   p  0.021  0.10  , H0 is rejected. There is sufficient evidence to indicate blood lactate level is linearly related to perceived recovery at   0.10. 3.85 a. This relationship will have a negative correlation since the researchers claim an “inverse relationship”. Copyright © 2020 Pearson Education, Inc. Chapter 3 b. Solving t = r n-2 1- r 2 3-53 for r using the smallest value of t that leads to a statistically significant t2 result gives: r 2 = 2 . So if t = 1.645 leads to a rejection of H 0 :  = 0, then t +n-2 2 (1.645) r = = .00801. Thus, r   0.00801  0.0895 since r is negative. 2 (1.645) + 337 – 2 2 3.86 a. Using MINITAB, the results are: Analysis of Variance Source DF Seq SS Contribution Adj SS Adj MS F-Value P-Value Regression 1 0.8309 85.38% 0.8309 0.83089 46.73 0.000 TEMP 1 0.8309 85.38% 0.8309 0.83089 46.73 0.000 Error 8 0.1423 14.62% 0.1423 0.01778 Total 9 0.9731 100.00% Model Summary S R-sq R-sq(adj) PRESS R-sq(pred) 0.133347 85.38% 83.56% 0.340173 65.04% Coefficients Term Constant TEMP Coef SE Coef -13.49 2.07 -0.05283 0.00773 95% CI T-Value P-Value (-18.27, -8.71) -6.51 0.000 (-0.07065, -0.03501) -6.84 0.000 VIF 1.00 Regression Equation PROPPASS = -13.49 – 0.05283 TEMP The fitted regression line is yˆ  13.49  0.0528 x. ˆ0  13.49 Since 0 is not within the range of observed value of temperature, ˆ0 has no meaning. ˆ1  0.0528 For each degree increase in temperature, the mean proportion of impurity is estimated to decrease by 0.0528. b. From the printout, the 95% confidence interval for 1 is  0.07065,  0.03501 . We estimate the mean proportion of impurity will decrease by anywhere from 0.07065 and 0.0351 for each degree increase in temperature. Because 0 is not contained in this interval, there is evidence to indicate that temperature contributes information about the proportions of impurity passing through helium. c. From the printout, r 2  0.8538. 85.38% of the variability in the proportion of impurity passing through helium around their means is explained by the linear relationship between the temperature and the proportion of impurity. Copyright © 2020 Pearson Education, Inc. 3-54 Simple Linear Regression d. Using MINITAB, the prediction interval is: Settings Variable Setting TEMP -273 Prediction Fit SE Fit 0.931953 0.0557562 95% CI 95% PI (0.803379, 1.06053) (0.598655, 1.26525) The 95% prediction interval is  0.5987, 1.2653 . We are 95% confident that the actual proportion of impurities will be between 0.5987 and 1.2653 when the temperature is -273 degrees. Since the proportion cannot be greater than 1, the interval really is  0.5987, 1.0  . 3.87 e. We have no idea what the relationship between temperature and proportion of impurity looks like outside the observed range. a. Piano: r = 0.447 Because this value is near 0.5, there is a slight positive linear relationship between recognition exposure time and goodness of view for piano. Bench: r = -0.057 Because this value is extremely close to 0, there is an extremely weak negative linear relationship between recognition exposure time and goodness of view for bench. Motorbike: r = 0.619 Because this value is near 0.5, there is a moderate positive linear relationship between recognition exposure time and goodness of view for motorbike. Armchair: r = .294 Because this value is fairly close to 0, there is a weak positive linear relationship between recognition exposure time and goodness of view for armchair. Teapot: r = 0.949 Because this value is very close to 1, there is a strong positive linear relationship between recognition exposure time and goodness of view for teapot. b. 2 Piano: r 2 = (0.447) = 0.1998 19.98% of the total sample variability around the sample mean recognition exposure time is explained by the linear relationship between the recognition exposure time and the goodness of view for piano. 2 Bench: r 2 = (-0.057) = 0.0032 0.32% of the total sample variability around the sample mean recognition exposure time is explained by the linear relationship between the recognition exposure time and the goodness of view for bench. 2 Motorbike: r 2 = (0.619) = 0.3832 Copyright © 2020 Pearson Education, Inc. Chapter 3 3-55 38.32% of the total sample variability around the sample mean recognition exposure time is explained by the linear relationship between the recognition exposure time and the goodness of view for motorbike. 2 Armchair: r 2 = (0.294) = 0.0864 8.64% of the total sample variability around the sample mean recognition exposure time is explained by the linear relationship between the recognition exposure time and the goodness of view for armchair. 2 Teapot: r 2 = (0.949) = 0.9006 90.06% of the total sample variability around the sample mean recognition exposure time is explained by the linear relationship between the recognition exposure time and the goodness of view for teapot. c. The test is: H 0 : 1 = 0 H a : 1 ¹ 0 Following are the values of  an t that correspond to df = n – 2 = 25 – 2 = 23. 2  t 0.20 1.319 0.10 1.714 0.05 2.069 0.02 2.500 0.01 2.807 0.002 3.485 0.001 3.767 2 Piano: t = 2.40 2.069 < 2.40 0.025. Bench: t = 0.27 0.27 0.2 H0 is not rejected. There is insufficient evidence to indicate that there is a linear relationship between goodness of view and recognition exposure time for bench for  £ 0.2. Motorbike: t = 3.78 3.78 > 3.767  p < 0.001 H0 can be rejected for  ³ 0.001. There is sufficient evidence to indicate that there is a linear relationship between goodness of view and recognition exposure time for motorbike for  ³ 0.001. Armchair: t = 1.47 1.319 < 1.47 < 1.717  p » 0.15 Copyright © 2020 Pearson Education, Inc. 3-56 Simple Linear Regression H0 cannot be rejected for levels of significance  < 0.15. There is insufficient evidence to indicate that there is a linear relationship between goodness of view and recognition exposure time for armchair for  3.767  p < 0.001 H0 can be rejected for  ³ 0.001. There is sufficient evidence to indicate that there is a linear relationship between goodness of view and recognition exposure time for teapot for  ³ 0.001. a. Using MINITAB, the scatterplot of the data is: Fitted Line Plot 100 80 60 PIPE 3.88 40 20 0 0 20 40 60 80 100 GUESS There is a slight positive linear trend to the data. b. Using MINITAB, the results are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 1779 1778.9 2.63 0.118 GUESS 1 1779 1778.9 2.63 0.118 Error 24 16261 677.6 Lack-of-Fit 20 14728 736.4 1.92 0.278 Pure Error 4 1534 383.4 Total 25 18040 VIF Model Summary S R-sq R-sq(adj) R-sq(pred) 26.0298 9.86% 6.11% 0.00% Coefficients Term Coef SE Coef T-Value P-Value Constant 30.1 11.4 2.63 0.015 GUESS 0.308 0.190 1.62 0.118 1.00 Regression Equation PIPE = 30.1 + 0.308 GUESS Copyright © 2020 Pearson Education, Inc. Chapter 3 3-57 The fitted regression line is yˆ  30.1  0.308 x. ˆ0  30.1 Because 0 is not within the observed values of the dowser’s guesses, ˆ0 has no meaning. c. To determine if the model is statistically useful for predicting actual pipe location, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t  1.62 and the p-value is p  0.118. Since the p-value is not small, H0 is not rejected. There is insufficient evidence to indicate the model is statistically useful for predicting actual pipe location at   0.118. Since there is no statistical evidence that there is a linear relationship between the dowsers’ guesses and the pipe location, this refutes the conclusion made by the German physicists. In addition, these were the ‘best’ results of the ‘best’ dowsers. If there was no relationship between the dowsers’ guesses and the pipe location for the ‘best’ of the ‘best’, there will not be a relationship between dowsers’ guesses and the pipe locations for all of the dowsers. a. Using MINITAB, the scatterplot is: Fitted Line Plot HEIGHT = 9.147 + 0.4815 DIAMETER 25 20 HEIGHT 3.89 d. 15 10 5 10 15 20 25 30 DIAMETER There appears to be a positive linear relationship between breast height diameter and height. b. Using MINITAB, the results are: Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 183.245 183.245 65.10 0.000 DIAMETER 1 183.245 183.245 65.10 0.000 Error 34 95.703 2.815 Lack-of-Fit 27 87.893 3.255 2.92 0.073 Pure Error 7 7.810 1.116 Total 35 278.947 Model Summary S R-sq R-sq(adj) R-sq(pred) 1.67773 65.69% 64.68% 57.07% Copyright © 2020 Pearson Education, Inc. 3-58 Simple Linear Regression Coefficients Term Coef SE Coef T-Value P-Value Constant 9.15 1.12 8.16 0.000 DIAMETER 0.4815 0.0597 8.07 0.000 VIF 1.00 Regression Equation HEIGHT = 9.15 + 0.4815 DIAMETER The least squares line is yˆ  9.15  0.4815 x. ˆ0  9.15 ˆ1  0.4815 c. The least squares line is printed on the scatterplot in part a. d. To determine if the breast height diameter contributes information for the prediction of tree height, we test: H 0 : 1 = 0 H a : 1 ¹ 0 The test statistic is t  8.07 and the p-value is p  0.000. Since the p-value is less than   p  0.000  0.05  , H0 is rejected. There is sufficient evidence to indicate the breast height diameter contributes information for the prediction of tree height at   0.05. e. Using MINITAB, the results are: Settings Variable Setting DIAMETER 20 Prediction Fit SE Fit 18.7763 0.299602 90% CI 90% PI (18.2697, 19.2829) (15.8945, 21.6581) The 90% confidence interval is 18.2697, 19.2829  . We are 90% confident that the mean height of trees is between 18.2697m and 19.2829m when the breast height diameter is 20cm. Copyright © 2020 Pearson Education, Inc.