Statistics for Business and Economics, 13th Edition Solution Manual

Chapter 2 Methods for Describing Sets of Data 2.1 First, we find the frequency of the grade A. The sum of the frequencies for all five grades must be 200. Therefore, subtract the sum of the frequencies of the other four grades from 200. The frequency for grade A is: 200  (36 + 90 + 30 + 28) = 200  184 = 16 To find the relative frequency for each grade, divide the frequency by the total sample size, 200. The relative frequency for the grade B is 36/200 = .18. The rest of the relative frequencies are found in a similar manner and appear in the table: Grade on Statistics Exam A: 90 100 B: 80  89 C: 65  79 D: 50  64 F: Below 50 Total 2.2 a. Relative Frequency .08 .18 .45 .15 .14 1.00 To find the frequency for each class, count the number of times each letter occurs. The frequencies for the three classes are: Class X Y Z Total b. Frequency 16 36 90 30 28 200 Frequency 8 9 3 20 The relative frequency for each class is found by dividing the frequency by the total sample size. The relative frequency for the class X is 8/20 = .40. The relative frequency for the class Y is 9/20 = .45. The relative frequency for the class Z is 3/20 = .15. Class X Y Z Total Frequency 8 9 3 20 Relative Frequency .40 .45 .15 1.00 10 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data c. The frequency bar chart is: 9 8 Frequency 7 6 5 4 3 2 1 0 d. X Y C la s s Z The pie chart for the frequency distribution is: Pie Chart of Class Category X Y Z Z 15.0% X 40.0% Y 45.0% 2.3 a. pU  107  .615 174 b. pS  57  .328 174 c. pR  10  .057 174 d. .615  360  221.4 , .328  360  118.1 , .057  360  20.5 Copyright © 2018 Pearson Education, Inc. 11 12 Chapter 2 e. Using MINITAB, the pie chart is: Pie Chart of Location Category Urban Suburban Rural Rural 5.7% Suburban 32.8% Urban 61.5% f. 61.5% of the STEM participants are from urban areas, 32.8% are from suburban areas, and 5.7% are from rural areas. g. Using MINITAB, the bar chart is: 70 60 Percent 50 40 30 20 10 0 Urban Suburban Rural Loc Percent is calculated within all data. Both charts give the same information. 2.4 a. According to the pie chart, .760 of the sample currently have.a cable/satellite TV subscription at home. The total number of adults sampled who have a cable/satellite TV subscription at home is 1,521  180  300  2,001 . The proportion is 1,521  .760 . 2,001 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data b. 13 Using MINITAB, the pie chart is: Pie Chart of Subscribe Category Cable TV Cord cutter Cord cutter 16.5% Cable TV 83.5% a. The type of graph is a bar graph. b. The variable measured for each of the robots is type of robotic limbs. c. From the graph, the design used the most is the “legs only” design. d. The relative frequencies are computed by dividing the frequencies by the total sample size. The total sample size is n = 106. The relative frequencies for each of the categories are: Type of Limbs None Both Legs ONLY Wheels ONLY Total e. Frequency 15 8 63 20 106 Relative Frequency 15/106 = .142 8 / 106 = .075 63/106 = .594 20/106 = .189 1.000 Using MINITAB, the Pareto diagram is: .60 .50 Relative Frequency 2.5 .40 .30 .20 .10 0 Legs Wheels None Both Type Percent within all data. Copyright © 2018 Pearson Education, Inc. 14 Chapter 2 2.6 a. Region is qualitative because it is not measured using numbers. b. pA P  c. Using MINITAB, the plot is 48  .32 , 150 26 pUS   .17 150 pC  10  .07 , 150 pE  34  .23 , 150 pLA  29  .19 , 150 pME / A  3  .02 , 150 35 30 Percent 25 20 15 10 5 0 Asia-Pacific Canada Europe Latin America Middle East/Africa United States Region Percent is calculated within all data. 2.7 d. The regions that most of the top 150 credit card users serve are Asia-Pacific, Europe, Latin America, and the United States. a. Using MINITAB, the pie chart is: Pie Chart of Product Explorer 12.0% Office 24.0% Category Office Windows Explorer Windows 64.0% Explorer had the lowest proportion of security issues with the proportion Copyright © 2018 Pearson Education, Inc. 6  .12 . 50 Methods for Describing Sets of Data b. Using MINITAB, the Pareto chart is: 50 40 Percent 30 20 10 0 e ot m Re co de u ec ex tio n il iv Pr e eg io at ev el n n io at m or f In lo sc di re su n De lo ia e vic er fs oo Sp fin g Bulletins Percent is calculated within all data. The security bulletin with the highest frequency is Remote code execution. Microsoft should focus on this repercussion. a. Using MINITAB, the Pareto chart is: 40 30 Percent 2.8 20 10 0 WLAN/Single WLAN/Multi WSN/SINGLE WSN/Multi AHN/SINGLE AHN/Multi Network/Channel Percent is calculated within all data. The network type and number of channels that suffered the most number of jamming attacks is WLAN/Single. The network/number of channels type that received the next most number of jamming attacks is WSN/Single and WLAN/Multi. The network/Number of channels type that suffered the least number of jamming attacks is AHN/Multi. b. Using MINITAB, the pie chart is: Pie Chart of Network Category WLAN WSN AHN AHN 16.3% WSN 27.5% WLAN 56.3% Copyright © 2018 Pearson Education, Inc. 15 16 Chapter 2 The network type that suffered the most jamming attacks is WLAN with more than half. The network type that suffered the least number of jamming attacks is AHN. 2.9 Using MINITAB, the pie chart is: Pie Chart of Degree Category None First Post Post 10.4% None 36.9% First 52.7% A little of half of the successful candidates had a First (Bachelor’s) degree, while a little more than a third of the successful candidates had no degree. Only about 10% of the successful candidates had graduate degrees. Using MINITAB, the bar graphs of the 2 waves is: Sch NoWorkGrad NoWorkBusSch Sch 2 NoWorkGrad WorkMBA WorkNoMBA NoWorkBusSch 1 90 80 70 60 50 40 30 20 10 0 WorkMBA WorkNoMBA Chart of Job Status Percent 2.10 Job Status Panel variable: Wave; Percent within all data. In wave 1, most of those taking the GMAT were working  2657 / 3244  .819 and none had MBA’s. About 20% were not working but were in either a 4-year institution or other graduate school  36  551 / 3244  .181 . In wave 2, almost all were now working 1787  1372 / 3244  .974 . Of those working, more than half had MBA’s 1787 / 1787  1372  .566 . Of those not working, most were in another graduate school. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.11 17 Using MINITAB, the Pareto diagram for the data is: Chart of Tenants 50 Percent 40 30 20 10 0 Small SmallStandard Large Tenants Major Anchor Percent within all data. Most of the tenants in UK shopping malls are small or small standard. They account for approximately 84% of all tenants   711  819 / 1,821  .84 . Very few (less than 1%) of the tenants are anchors. Using MINITAB, the side-by-side bar graphs are: Chart of Acquisitions No 1980 Yes 1990 100 75 50 Percent 2.12 25 0 2000 100 75 50 25 0 No Yes Acquisitions Panel variable: Year; Percent within all data. In 1980, very few firms had acquisitions 18 / 1, 963  .009  . By 1990, the proportion of firms having acquisitions increased to 350 / 2,197  .159 . By 2000, the proportion of firms having acquisitions increased to 748 / 2,778  .269 . Copyright © 2018 Pearson Education, Inc. 18 Chapter 2 2.13 a. Using MINITAB, the pie chart of the data is: Pie Chart of City Category SF NY LA CH SF 25.0% CH 25.9% LA 16.1% NY 33.0% b. Using MINITAB, the pie chart for San Francisco is: Pie Chart of Rating City = SF Category Excellent Good Bad Excellent 10.1% Bad 21.7% Good 68.1% Using MINITAB, the bar charts are: Chart of Tweets Excellent CH Good Bad LA 60 45 Percent of Tweets c. 30 15 NY SF 60 45 30 15 0 Excellent Good Bad Rating Panel variable: City Percent is calculated within all data. Copyright © 2018 Pearson Education, Inc. 0 Methods for Describing Sets of Data d. 2.14 19 In all cities, most customers rated the iphone 6 as ‘good’, while very few rated the iphone 6 as excellent. Using MINITAB, a pie chart of the data is: Pie Chart of Measure Category Big Shows Funds Raised Members Paying visitors Total visitors Big Shows 20.0% Total visitors 26.7% Funds Raised 23.3% Paying visitors 16.7% Members 13.3% Since the sizes of the slices are close to each other, it appears that the researcher is correct. There is a large amount of variation within the museum community with regard to performance measurement and evaluation. 2.15 a. The variable measured by Performark is the length of time it took for each advertiser to respond back. b. The pie chart is: Pie Chart of Response Time 13-59 days 25.6% Never responded 23.3% Category Never responded >120 days 60-120 days 13-59 days >120 days 13.3% 60-120 days 37.8% c. Twenty-one percent or .2117,000  3,570 of the advertisers never respond to the sales lead. d. The information from the pie chart does not indicate how effective the “bingo cards” are. It just indicates how long it takes advertisers to respond, if at all. Copyright © 2018 Pearson Education, Inc. 20 2.16 Chapter 2 Using MINITAB, the side-by-side bar graphs are: Chart of Dive Left Ahead Middle Right Behind 80 60 Percent 40 20 0 Tied 80 60 40 20 0 Left Middle Right Dive Panel variable: Situation; Percent within all data. From the graphs, it appears that if the team is either tied or ahead, the goal-keepers tend to dive either right or left with equal probability, with very few diving in the middle. However, if the team is behind, then the majority of goal-keepers tend to dive right (71%). a. Using MINITAB, bar charts for the 3 variables are: Chart of Well Class 120 100 80 Count 2.17 60 40 20 0 Private Public Well Class Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data Chart of Aquifer 200 Count 150 100 50 0 Bedrock Unconsolidated Aquifer Chart of Detection 160 140 120 Count 100 80 60 40 20 0 Below Limit Detect Detection Using MINITAB, the side-by-side bar chart is: Chart of Detection Below Limit Private Detect Public 80 70 60 Percent b. 50 40 30 20 10 0 Below Limit Detect Detection Panel variable: Well Class; Percent within all data. Copyright © 2018 Pearson Education, Inc. 21 22 Chapter 2 c. Using MINITAB, the side-by-side bar chart is: Chart of Detection Below Limit Bedrock Detect Unconsoli 70 60 Percent 50 40 30 20 10 0 Below Limit Detect Detection Panel variable: Aquifer; Percent within all data. d. Using MINITAB, the relative frequency histogram is: .25 .20 Relative Frequency 2.18 From the bar charts in parts a-c, one can infer that most aquifers are bedrock and most levels of MTBE were below the limit ( 2 / 3) . Also the percentages of public wells verses private wells are relatively close. Approximately 80% of private wells are not contaminated, while only about 60% of public wells are not contaminated. The percentage of contaminated wells is about the same for both types of aquifers ( 30%) . .15 .10 .05 0 0 .5 2.5 4.5 6.5 8.5 Class 10.5 12.5 14.5 16.5 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.19 23 To find the number of measurements for each measurement class, multiply the relative frequency by the total number of observations, n = 500. The frequency table is: Measurement Class Relative Frequency .10 .5  2.5 .15 2.5  4.5 .25 4.5  6.5 .20 6.5  8.5 .05 8.5  10.5 .10 10.5  12.5 .10 12.5  14.5 .05 14.5  16.5 Frequency 500(.10) = 50 500(.15) = 75 500(.25) = 125 500(.20) = 100 500(.05) = 25 500(.10) = 50 500(.10) = 50 500(.05) = 25 500 Using MINITAB, the frequency histogram is: 140 120 Frequency 100 80 60 40 20 0 2.20 0 .5 2.5 4.5 6.5 8.5 Class 10.5 12.5 14.6 16.5 a. The original data set has 1 + 3 + 5 + 7 + 4 + 3 = 23 observations. b. For the bottom row of the stem-and-leaf display: The stem is 0. The leaves are 0, 1, 2. Assuming that the data are up to two digits, rounded off to the nearest whole number, the numbers in the original data set are 0, 1, and 2. 2.21 c. Again, assuming that the data are up to two digits, rounded off to the nearest whole number, the dot plot corresponding to all the data points is: a. This is a frequency histogram because the number of observations is graphed for each interval rather than the relative frequency. b. There are 14 measurement classes. Copyright © 2018 Pearson Education, Inc. 2.22 2.23 2.24 Chapter 2 c. There are 49 measurements in the data set. a. The graph is a frequency histogram. b. The quantitative variable summarized in the graph is the fup/fumic ratio. c. The proportion of ratios greater than 1 is d. The proportion of ratios less than .4 is a. Since the label on the vertical axis is Percent, this is a relative frequency histogram. We can divide the percents by 100% to get the relative frequencies. b. Summing the percents represented by all of the bars above 100, we get approximately 12%. a. Using MINITAB, the stem-and-leaf display and histogram are: 8  5  1 14   .034 . 416 416 181  108 289   .695 . 416 416 Stem-and-Leaf Display: SCORE Stem-and-leaf of SCORE Leaf Unit = 1.0 1 1 2 2 3 4 4 4 6 16 26 40 57 88 (47) 60 26 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 N = 195 9 3 6 8 44 6666677777 8888899999 00001111111111 22222222223333333 4444444444444555555555555555555 66666666666666666666777777777777777777777777777 8888888888999999999999999999999999 00000000000000000000000000 Histogram of SCORE 50 40 Frequency 24 30 20 10 0 72 76 80 84 88 92 96 100 SCORE Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.25 b. From the stem-and-leaf display, there are only 6 observations with sanitation scores less than 86. The proportion of ships with accepted sanitation standards is (195  6) / 195  189 / 195  .97 . c. The score of 69 is highlighted in the stem-and-leaf display. a. Using MINITAB, a dot plot of the data is: Dotplot of Acquisitions 0 120 240 360 480 Acquisitions 600 720 840 b. By looking at the dot plot, one can conclude that the years 1996-2000 had the highest number of firms with at least one acquisition. The lowest number of acquisitions in that time frame (748) is almost 100 higher than the highest value from the remaining years. a. Using MINITAB, a histogram of the current values of the 32 NFL teams is: Histogram of VALUE ($mil) 16 14 12 Frequency 2.26 25 10 8 6 4 2 0 1800 2400 3000 3600 VALUE ($mil) Copyright © 2018 Pearson Education, Inc. Chapter 2 b. Using MINITAB, a histogram of the 1-year change in current value for the 32 NFL teams is: Histogram of CHANGE (%) 7 6 Frequency 5 4 3 2 1 0 20 30 40 50 60 70 CHANGE (%) c. Using MINITAB, a histogram of the debt-to-value ratios for the 32 NFL teams is: Histogram of DEBT/VALUE (%) 14 12 Frequency 10 8 6 4 2 0 10 20 30 40 50 DEBT/VALUE (%) d. Using MINITAB, a histogram of the annual revenues for the 32 NFL teams is: Histogram of REVENUE ($ mil) 18 16 14 12 Frequency 26 10 8 6 4 2 0 300 350 400 450 500 550 600 REVENUE ($mil) Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data e. 27 Using MINITAB, a histogram of the operating incomes for the 32 NFL teams is: Histogram of INCOME ($mil) 12 Frequency 10 8 6 4 2 0 60 120 180 240 INCOME ($mil) For all of the histograms, there is 1 team that has a very high score. The Dallas Cowboys have the largest values for current value, annual revenues, and operating income. However, the San Francisco 49ers have the highest 1-year change, while the Atlanta Falcons have the highest debt-to-value ratio. All of the graphs except the one showing the 1-Yr Value Changes are skewed to the right. a. Using MINITAB, the frequency histograms for 2014 and 2010 SAT mathematics scores are: Histogram of MATH2014, MATH2010 440 480 520 MATH2014 14 560 600 MATH2010 12 Frequency 10 8 6 4 2 0 440 480 520 560 600 It appears that the scores have not changed very much at all. The graphs are very similar. b. Using MINITAB, the frequency histogram of the differences is: Histogram of Diff Math 30 25 Frequency 2.27 f. 20 15 10 5 0 -90 -60 -30 0 30 Diff Math Copyright © 2018 Pearson Education, Inc. 28 Chapter 2 From this graph of the differences, we can see that there are more observations to the right of 0 than to the left of 0. This indicates that, in general, the scores have improved since 2010. c. 2.28 From the graph, the largest improvement score is between 22.5 and 37.5. The actual largest score is 34 and it is associated with Wyoming. Using MINITAB, the two dot plots are: Dotplot of Arrive, Depart Arrive Depart 108 120 132 144 156 168 Data Yes. Most of the numbers of items arriving at the work center per hour are in the 135 to 165 area. Most of the numbers of items departing the work center per hour are in the 110 to 140 area. Because the number of items arriving is larger than the number of items departing, there will probably be some sort of bottleneck. 2.29 Using MINITAB, the stem-and-leaf display is: Stem-and-Leaf Display: Dioxide Stem-and-leaf of Dioxide Leaf Unit = 0.10 5 7 (2) 7 7 5 5 4 4 0 0 1 1 2 2 3 3 4 N = 16 12234 55 34 44 3 0000 The highlighted values are values that correspond to water specimens that contain oil. There is a tendency for crude oil to be present in water with lower levels of dioxide as 6 of the lowest 8 specimens with the lowest levels of dioxide contain oil. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.30 a. 29 Using MINTAB, the histograms of the number of deaths is: Histogram of Deaths 12 10 Frequency 8 6 4 2 0 0 200 400 600 800 1000 Deaths b. The interval containing the largest proportion of estimates is 0-50. Almost half of the estimates fall in this interval. 2.31 Yes, we would agree with the statement that honey may be the preferable treatment for the cough and sleep difficulty associated with childhood upper respiratory tract infection. For those receiving the honey dosage, 14 of the 35 children (or 40%) had improvement scores of 12 or higher. For those receiving the DM dosage, only 9 of the 33 (or 24%) children had improvement scores of 12 or higher. For those receiving no dosage, only 2 of the 37 children (or 5%) had improvement scores of 12 or higher. In addition, the median improvement score for those receiving the honey dosage was 11, the median for those receiving the DM dosage was 9 and the median for those receiving no dosage was 7. 2.32 a. Using MINITAB, the stem-and-leaf display is as follows, where the stems are the units place and the leaves are the decimal places: Stem-and-Leaf Display: Time Stem-and-leaf of Time Leaf Unit = 0.10 (26) 23 15 9 4 2 2 1 1 1 b. 1 2 3 4 5 6 7 8 9 10 N = 49 00001122222344444445555679 11446799 002899 11125 24 8 1 A little more than half (26/49 = .53) of all companies spent less than 2 months in bankruptcy. Only two of the 49 companies spent more than 6 months in bankruptcy. It appears that, in general, the length of time in bankruptcy for firms using “prepacks” is less than that of firms not using prepacks.” Copyright © 2018 Pearson Education, Inc. 30 Chapter 2 c. A dot diagram will be used to compare the time in bankruptcy for the three types of “prepack” firms: Votes Dotplot of Time vs Votes Joint None Prepack 1.2 2.4 3.6 4.8 6.0 7.2 8.4 9.6 Time d. Using MINITAB, the histogram of the data is: Histogram of INTTIME 60 50 40 Frequency 2.33 The highlighted times in part a correspond to companies that were reorganized through a leverage buyout. There does not appear to be any pattern to these points. They appear to be scattered about evenly throughout the distribution of all times. 30 20 10 0 0 75 150 225 300 INTTIME 375 450 525 This histogram looks very similar to the one shown in the problem. Thus, there appears that there was minimal or no collaboration or collusion from within the company. We could conclude that the phishing attack against the organization was not an inside job. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.34 31 Using MINITAB, the stem-and-leaf display for the data is: Stem-and-Leaf Display: Time Stem-and-leaf of Time Leaf Unit = 1.0 3 7 (7) 11 6 4 2 1 N = 25 3 239 4 3499 5 0011469 6 34458 7 13 8 26 9 5 10 2 The numbers in bold represent delivery times associated with customers who subsequently did not place additional orders with the firm. Since there were only 2 customers with delivery times of 68 days or longer that placed additional orders, I would say the maximum tolerable delivery time is about 65 to 67 days. Everyone with delivery times less than 67 days placed additional orders. 2.35 Assume the data are a sample. The sample mean is: x  x  3.2  2.5  2.1  3.7  2.8  2.0  16.3  2.717 n 6 6 The median is the average of the middle two numbers when the data are arranged in order (since n = 6 is even). The data arranged in order are: 2.0, 2.1, 2.5, 2.8, 3.2, 3.7. The middle two numbers are 2.5 and 2.8. The median is: 2.5  2.8 5.3   2.65 2 2 2.36  x  85  8.5 a. x b. x 400  25 16 c. x 35  .778 45 d. x 242  13.44 18 n 10 2.37 The mean and median of a symmetric data set are equal to each other. The mean is larger than the median when the data set is skewed to the right. The mean is less than the median when the data set is skewed to the left. Thus, by comparing the mean and median, one can determine whether the data set is symmetric, skewed right, or skewed left. 2.38 The median is the middle number once the data have been arranged in order. If n is even, there is not a single middle number. Thus, to compute the median, we take the average of the middle two numbers. If n is odd, there is a single middle number. The median is this middle number. Copyright © 2018 Pearson Education, Inc. 32 Chapter 2 A data set with five measurements arranged in order is 1, 3, 5, 6, 8. The median is the middle number, which is 5. A data set with six measurements arranged in order is 1, 3, 5, 5, 6, 8. The median is the average of the 5  5 10 middle two numbers which is   5. 2 2 2.39 Assume the data are a sample. The mode is the observation that occurs most frequently. For this sample, the mode is 15, which occurs three times. The sample mean is: x  x  18  10  15  13  17  15  12  15  18  16  11  160  14.545 n 11 11 The median is the middle number when the data are arranged in order. The data arranged in order are: 10, 11, 12, 13, 15, 15, 15, 16, 17, 18, 18. The middle number is the 6th number, which is 15. 2.40 a. b. c. 2.41 2.42 x  x  7    4  15  2.5 x  x  2    4  40  3.08 x  x  51    37  496  49.6 n 6 6 33 Median =  3 (mean of 3rd and 4th numbers, after ordering) 2 Mode = 3 n 13 13 Median = 3 (7th number, after ordering) Mode = 3 n 10 10 48  50 Median =  49 (mean of 5th and 6th numbers, after ordering) 2 Mode = 50 a. For a distribution that is skewed to the left, the mean is less than the median. b. For a distribution that is skewed to the right, the mean is greater than the median. c. For a symmetric distribution, the mean and median are equal. a. The average score for Energy Star is 4.44. The average score is close to 5 meaning the average score is close to ‘very familiar’. b. The median score for Energy Star is 5. At least half of the respondents indicated that they are very familiar with the ecolabel Energy Star. c. The mode score for Energy Star is 5. More respondents answered ‘very familiar’ to Energy Star than any other option. d. The ecolabel that appears to be most familiar to travelers is Energy Star. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.43 2.44 33 a. This statistic represents a population mean because it is computed for every freshman who attended the university in 2015. The average financial aid awarded to freshmen at Harvard University is $41,555. b. This statistic represents a sample median because it is computed for a sample of alumni. The median salary during early career for alumni of Harvard University is $61,400. Half of the alumni from Harvard make more than $61,400 during their early career. a. The mean is  x  9  (.1)  (1.6)  14.6  16.0  7.7  19.9  9.8  3.2  24.8  17.6  10.7  9.1  140.7  10.82 x n 13 13 The average annualized percentage return on investment for 13 randomly selected stock screeners is 10.82. b. Since the number of observations is odd, the median is the middle number once the data have been arranged in order. The data arranged in order are: -1.6 -.1 3.2 7.7 9.0 9.1 9.8 10.7 14.6 16.0 17.6 19.9 24.8 The middle number is 9.8 which is the median. Half of the annualized percentage returns on investment are below 9.8 and half are above 9.8. 2.45 a. The mean years of experience is x    x n 30  15  10    25 303   17.824 . The average number 17 17 of years of experience is 17.824 years. b. To find the median, we first arrange the data in order from lowest to highest: 3 5 6 9 10 10 10 15 20 20 25 25 25 30 30 30 30 Since there are an odd number of observations, the median is the middle number which is 20. Half of interviewees have less than 20 years of experience. 2.46 c. The mode is 30. More interviewees had 30 years of experience than any other value. a. The sample mean is: n x x  i 1 n i  1.72  2.50  2.16    1.95 37.62   1.881 20 20 The sample average surface roughness of the 20 observations is 1.881. b. The median is found as the average of the 10th and 11th observations, once the data have been ordered. The ordered data are: 1.06 1.09 1.19 1.26 1.27 1.40 1.51 1.72 1.95 2.03 2.05 2.13 2.13 2.16 2.24 2.31 2.41 2.50 2.57 2.64 The 10th and 11th observations are 2.03 and 2.05. The median is: 2.03  2.05 4.08   2.04 2 2 Copyright © 2018 Pearson Education, Inc. 34 Chapter 2 The middle surface roughness measurement is 2.04. Half of the sample measurements were less than 2.04 and half were greater than 2.04. 2.47 2.48 2.49 c. The data are somewhat skewed to the left. Thus, the median might be a better measure of central tendency than the mean. The few small values in the data tend to make the mean smaller than the median. a. The mean permeability for group A sandstone slices is 73.62mD. The average permeability for group A sandstone is 73.62mD. The median permeability for group A sandstone is 70.45mD. Half of the sandstone slices in group A have permeability less than 70.45mD. b. The mean permeability for group B sandstone slices is 128.54mD. The average permeability for group B sandstone is 128.54mD. The median permeability for group B sandstone is 139.30mD. Half of the sandstone slices in group B have permeability less than 139.30mD. c. The mean permeability for group C sandstone slices is 83.07mD. The average permeability for group C sandstone is 83.07mD. The median permeability for group C sandstone is 78.650mD. Half of the sandstone slices in group C have permeability less than 78.65mD. d. The mode permeability score for group C sandstone is 70.9. More sandstone slices in group C had permeability scores of 70.9 than any other value. e. Weathering type B appears to result in faster decay because the mean, median, and mode values fore group B is higher than those for group C. a. The mean is 67.755. The statement is accurate. b. The median is 68.000. The statement is accurate. c. The mode is 64. The statement is not accurate. A better statement would be: “The most common reported level of support for corporate sustainability for the 992 senior managers was 64. d. Since the mean and median are almost the same, the distribution of the 992 support levels should be fairly symmetric. The histogram in Exercise 2.23 is almost symmetric. a. The median is the middle number (18th) once the data have been arranged in order because n = 35 is odd. The honey dosage data arranged in order are: 4,5,6,8,8,8,8,9,9,9,9,10,10,10,10,10,10,11,11,11,11,12,12,12,12,12,12,13,13,14,15,15,15,15,16 The 18th number is the median = 11. b. The median is the middle number (17th) once the data have been arranged in order because n = 33 is odd. The DM dosage data arranged in order are: 3,4,4,4,4,4,4,6,6,6,7,7,7,7,7,8,9,9,9,9,9,10,10,10,11,12,12,12,12,12,13,13,15 The 17th number is the median = 9. c. The median is the middle number (19th) once the data have been arranged in order because n = 37 is odd. The No dosage data arranged in order are: 0,1,1,1,3,3,4,4,5,5,5,6,6,6,6,7,7,7,7,7,7,7,7,8,8,8,8,8,8,9,9,9,9,10,11,12,12 The 19th number is the median = 7. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.50 35 d. Since the median for the Honey dosage is larger than the other two, it appears that the honey dosage leads to more improvement than the other two treatments. a. The mean dioxide level is x  3.3  0.5  1.3    4.0 29   1.81 . The average dioxide amount is 16 16 1.81. b. Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.1 0.2 0.2 0.3 0.4 0.5 0.5 1.3 1.4 2.4 2.4 3.3 4.0 4.0 4.0 4.0 The median is 1.3  1.4 2.7   1.35 . Half of the dioxide levels are below 1.35 and half are above 2 2 1.35. c. The mode is the number that occurs the most. For this data set the mode is 4.0. The most frequent level of dioxide is 4.0. d. Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.1 0.3 1.4 2.4 2.4 3.3 4.0 4.0 4.0 4.0 The median is e. 2.4  3.3 5.7   2.85 . 2 2 Since the number of observations is even, the median is the average of the middle 2 numbers once the data are arranged in order. The data arranged in order are: 0.2 0.2 0.4 0.5 0.5 1.3 The median is 2.51 0.4  0.5 0.9   0.45 . 2 2 f. The median level of dioxide when crude oil is present is 0.45. The median level of dioxide when crude oil is not present is 2.85. It is apparent that the level of dioxide is much higher when crude oil is not present. a. Skewed to the right. There will be a few people with very high salaries such as the president and football coach. b. Skewed to the left. On an easy test, most students will have high scores with only a few low scores. c. Skewed to the right. On a difficult test, most students will have low scores with only a few high scores. d. Skewed to the right. Most students will have a moderate amount of time studying while a few students might study a long time. e. Skewed to the left. Most cars will be relatively new with a few much older. f. Skewed to the left. Most students will take the entire time to take the exam while a few might leave early. Copyright © 2018 Pearson Education, Inc. 36 Chapter 2 2.52 a. The sample means is: x  x  3.58  3.48  3.27    1.17  77.07  1.927 n 40 40 The median is found as the 20th and 21st observations, once the data have been ordered. The 20th and 21st observations are 1.75 and 1.76. The median is: 1.75  1.76 3.51   1.755 2 2 The mode is the number that occurs the most and is 1.4, which occurs 3 times. b. The sample average driving performance index is 1.927. The median driving performance index is 1.755. Half of all driving performance indexes are less than 1.755 and half are higher. The most common driving performance index value is 1.4. c. Since the mean is larger than the median, the data are skewed to the right. Using MINITAB, a histogram of the driving performance index values is: Histogram of INDEX 10 Frequency 8 6 4 2 0 2.53 1.5 2.0 2.5 INDEX 3.0 3.5 For the “Joint exchange offer with prepack” firms, the mean time is 2.6545 months, and the median is 1.5 months. Thus, the average time spent in bankruptcy for “Joint” firms is 2.6545 months, while half of the firms spend 1.5 months or less in bankruptcy. For the “No prefiling vote held” firms, the mean time is 4.2364 months, and the median is 3.2 months. Thus, the average time spent in bankruptcy for “No prefiling vote held” firms is 4.2364 months, while half of the firms spend 3.2 months or less in bankruptcy. For the “Prepack solicitation only” firms, the mean time is 1.8185 months, and the median is 1.4 months. Thus, the average time spent in bankruptcy for “Prepack solicitation only” firms is 1.8185 months, while half of the firms spend 1.4 months or less in bankruptcy. Since the means and medians for the three groups of firms differ quite a bit, it would be unreasonable to use a single number to locate the center of the time in bankruptcy. Three different “centers” should be used. 2.54 a. The mean is     x n 2  1  1  1 62   2.067 . The average number of nuclear power plants per 30 30 state for states that have nuclear power plants is 2.067. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 37 The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 4 4 5 6 Since there are an even number of data points, the median is the average of the middle two numbers which is 22  2 . Half of the states with nuclear power plants have 2 or fewer plants. 2 The mode is 1. Most states that have nuclear power plants have just 1. b. For regulated states: The mean is     x n 2  1  1  1 31   1.824 . The average number of 30 17 nuclear power plants per state for states that have nuclear power plants is 1.824. The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 4 Since there are an odd number of data points, the median is the middle number which is 2. Half of the states with nuclear power plants have 2 or fewer plants. The mode is 1 and 2. Most states that have nuclear power plants have 1 or 2. c. For deregulated states: The mean is     x n 1  1  1  1 31   2.385 . The average number of 13 13 nuclear power plants per state for states that have nuclear power plants is 2.385. The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 1 2 2 3 3 4 5 6 Since there are an odd number of data points, the median is the middle number which is 2. Half of the states with nuclear power plants have 2 or fewer plants. The mode is 1. Most states that have nuclear power plants have 1. d. e. Because the average number of nuclear power plants in states that are deregulated is greater than the average number of nuclear power plants in states that are regulated, it appears that regulations limits the number of nuclear power plants. After deleting the largest observation, the mean is     x n 2  1  1  1 56   1.931 . The average 30 29 number of nuclear power plants per state for states that have nuclear power plants is 1.931. The median is found by first arranging the data in order from smallest to largest: 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 4 4 5 Since there are an odd number of data points, the median is the middle number which is 2. Half of the states with nuclear power plants have 2 or fewer plants. The mode is 1. Most states that have nuclear power plants have just 1. By deleting the largest observation, the mean decrease, but the median and mode remain the same. Copyright © 2018 Pearson Education, Inc. 38 2.55 2.56 Chapter 2 f. The trimmed mean is     a. extreme values. Due to the “elite” superstars, the salary distribution is skewed to the right. Since this implies that the median is less than the mean, the players’ association would want to use the median. b. The owners, by the logic of part a, would want to use the mean. a. The primary disadvantage of using the range to compare variability of data sets is that the two data sets can have the same range and be vastly different with respect to data variation. Also, the range is greatly affected by extreme measures. The sample variance is the sum of the squared deviations of the observations from the sample mean divided by the sample size minus 1. The population variance is the sum of the squared deviations of the values from the population mean divided by the population size. b. c. x n 2  1  1  1 49   1.885 . The trimmed mean is not affected by 26 26 The variance of a data set can never be negative. The variance of a sample is the sum of the squared deviations from the mean divided by n  1. The square of any number, positive or negative, is always positive. Thus, the variance will be positive. The variance is usually greater than the standard deviation. However, it is possible for the variance to be smaller than the standard deviation. If the data are between 0 and 1, the variance will be smaller than the standard deviation. For example, suppose the data set is .8, .7, .9, .5, and .3. The sample mean is: x  x  .8  .7  .9  .5  .3  3.2  .64 n .5 5 The sample variance is: s 2   x2   x n 1 n 2  3.22 13  .232  .058 5 1 4 2.28  The standard deviation is s  .058  .241 2.57 a. Range = 4  0 = 4 s2  b. 2 n 1 82 5  2.3  5 1 22  n s  2.3  1.52 Range = 6  0 = 6 s2  c.   x x  2   x x  2 2 n 1 n  17 2 7  3.619 7 1 s  3.619  1.9 302 10  7.111 10  1 s  7.111  2.67 63  Range = 8  (2) = 10 s2   x2   x n 1 n 2  154  Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data d. Range = 1  (3) = 4 s2  2.58 a. b. 2.59 s2  s2    x x  n 1  n  x x   2  x2  n  x n 1  2 1002 40  3.3333 40  1 380  n a.  x  3  1  10  10  4  28 s2  s2  s  .1868  .432  x  3  1  10  10  4  226 2 2 2 2 2 2  x  28  5.6 n 5   x x  2 2 n 1 n  282 5  69.2  17.3 5 1 4 226   x  8  10  32  5  55 x s  3.3333  1.826 17 2 20  .1868  20  1 s2  x s  4.8889  2.211 18  2 n 1 s  1.395  1.18 202 10  4.8889  10  1 2 n  x x  (6.8) 2 17  1.395 17  1 25.04  84  2 n 1  2 2 c. b. 39 s  17.3  4.1593  x  8  10  32  5  1213 2 2 2 2 2  x  55  13.75 feet n 4   x x  2 2 n 1 n  552 4  456.75  152.25 square feet 4 1 3 1213  s  152.25  12.339 feet c.  x  1  (4)  (3)  1  (4)  (4)  15  x  (1)  (4)  (3)  1  (4)  (4)  59 2 x s2  2 2 2 2  x  15  2.5 n 6   x x  2 2 n 1 n  (15) 2 6  21.5  4.3 6 1 5 59  Copyright © 2018 Pearson Education, Inc. s  4.3  2.0736 2 2 40 Chapter 2 d. x  x s2  2.60 a. 2 2 2 2  x  2  1  .33 ounce n 6 3   x x  2 2 n 1 n 24 22  .2933  25 6   .0587 square ounce 6 1 5 s  .0587  .2422 ounce   x x  2 2 n 1 n  1992 5  3.7 5 1 s  3.7  1.92 3032 9  1,949.25 9 1 s  1,949.25  44.15 2952 8  1,307.84 8 1 s  1,307.84  36.16 7935    x x  2 2 n 1 n  25, 795  Range = 100  2 = 98 s2  2.61 2 Range = 100  1 = 99 s2  c. 2 24 1 1 1 2 1 4  x 2   5    5    5    5    5    5   25  .96 Range = 42  37 = 5 s2  b. 1 1 1 2 1 4 10       2 5 5 5 5 5 5 5  x2   x n 1 n 2  20, 033  This is one possibility for the two data sets. Data Set 1: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Data Set 2: 0, 0, 1, 1, 2, 2, 3, 3, 9, 9 The two sets of data above have the same range = largest measurement  smallest measurement = 9  0 = 9. The means for the two data sets are: x1   x  0  1  2  3  4  5  6  7  8  9  45  4.5 x2   x  0  0  1  1  2  2  3  3  9  9  30  3 n n 10 10 10 10 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data The dot diagrams for the two data sets are shown below. Dotplot of x1, x2 x1 0 2 x 4 6 8 6 8 x2 0 2.62 x 2 4 This is one possibility for the two data sets. Data Set 1: 1, 1, 2, 2, 3, 3, 4, 4, 5, 5 Data Set 2: 1, 1, 1, 1, 1, 5, 5, 5, 5, 5 x1   x  1  1  2  2  3  3  4  4  5  5  30  3 x2   x  1  1  1  1  1  5  5  5  5  5  30  3 n 10 n 10 10 10 Therefore, the two data sets have the same mean. The variances for the two data sets are: s12  s22    x x  n 1  2 2 n  x x   2 2 n 1 n  302 10  20  2.2222 9 9 110  302 10  40  4.4444 9 9 130  Copyright © 2018 Pearson Education, Inc. 41 42 Chapter 2 The dot diagrams for the two data sets are shown below. Dotplot of x1, x2 x1 x 1 2 3 x2 1 2 3 4 5 4 5 x 2.63 a. Range = 3  0 = 3 s2  b.   x x  2 n 1 72 5  1.3  5 1 15  2 n s  1.3  1.14 After adding 3 to each of the data points, Range = 6  3 = 3 s2  c.  x2   x n 1 2 n  222 5  1.3 5 1 102  s  1.3  1.14 After subtracting 4 from each of the data points, Range = 1  (4) = 3 s2  d.   x x  2 2 n 1 n  (13) 2 5  1.3 5 1 39  s  1.3  1.14 The range, variance, and standard deviation remain the same when any number is added to or subtracted from each measurement in the data set. 2.64 The ecolabel that had the most variation in the numerical responses is Audubon International because it has the largest standard deviation. 2.65 a. The range of permeability scores for group A sandstone slices is Range  max  min  122.4  55.2  67.2 . Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data b. The variance of group A sandstone slices is s 2     x x  2 2 n 1 n  43 7,362.32 100  209.5292 . 100  1 562,778  The standard deviation is s  209.5292  14.475 . 2.66 c. Condition B has the largest range and the largest standard deviation. Thus, condition B has more variable permeability data. a. The range in the difference between the maximum and minimum values. The range  24.8 –  1.6   26.4 . The units of measurement are percents. b. The variance is s2    x x  2 2 n 1 n  140.7 2 13  2236.41  1522.8069  713.6031  59.4669 13  1 12 12 2236.41  The units are square percents. 2.67 2.68 c. The standard deviation is s  59.4669  7.7115 . The units are percents. a. The range is 155. The statement is accurate. b. The variance is 722.036. The statement is not accurate. A more accurate statement would be: “The variance of the levels of supports for corporate sustainability for the 992 senior managers is 722.036.” c. The standard deviation is 26.871. If the units of measure for the two distributions are the same, then the distribution of support levels for the 992 senior managers has less variation than a distribution with a standard deviation of 50. If the units of measure for the second distribution is not known, then we cannot compare the variation in the two distributions by looking at the standard deviations alone. d. The standard deviation best describes the variation in the distribution. The range can be greatly affected by extreme measures. The variance is measured in square units which is hard to interpret. Thus, the standard deviation is the best measure to describe the variation. a. The sample variance of the honey dosage group is: s2    x x  2 2 n 1 n  3752 35  277.142857  8.1512605 35-1 34 4295- The standard deviation is: s  8.1512605  2.855 b. The sample variance of the DM dosage group is: s2    x x  2 2 n 1 n  2752 33  339.33333  10.604167 33-1 32 2631- The standard deviation is: s  10.604167  3.256 Copyright © 2018 Pearson Education, Inc. 44 Chapter 2 c. The sample variance of the control group is: s2    x x  2 2 n 1 n  2412 37  311.243243  8.6456456 37-1 36 1881- The standard deviation is: s  8.6456456  2.940 2.69 d. The group with the most variability is the group with the largest standard deviation, which is the DM group. The group with the least variability is the group with the smallest standard deviation, which is the honey group. a. The range is the largest observation minus the smallest observation or 6 – 1 = 5. 2     xi  62 2 xi2  i 178   n 30  1.7195 The variance is: s 2  i  30  1 n 1 The standard deviation is: s  s 2  1.7195  1.311 b. The largest observation is 6. It is deleted from the data set. The new range is: 5 – 1 = 4. 2     xi  56 2 xi2  i 142   n 29  1.2094 The variance is: s 2  i  29  1 n 1 The standard deviation is: s  s 2  1.2094  1.100 When the largest observation is deleted, the range, variance and standard deviation decrease. c. The largest observation is 6 and the smallest is 1. When these two observations are deleted from the data set, the new range is: 5 – 1 = 4. 2     xi  552 xi2  i 141   n 28  1.2209 The variance is: s 2  i  28  1 n 1 The standard deviation is: s  s 2  1.2209  1.1049 2.70 a. When the largest and smallest observations are deleted, the range, variance and standard deviation decrease. A worker’s overall time to complete the operation under study is determined by adding the subtasktime averages. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data Worker A The average for subtask 1 is: x   x  211  30.14 The average for subtask 2 is: x   x  21  3 n 45 7 n 7 Worker A’s overall time is 30.14 + 3 = 33.14. Worker B The average for subtask 1 is: x   x  213  30.43 The average for subtask 2 is: x   x  29  4.14 n 7 n 7 Worker B’s overall time is 30.43 + 4.14 = 34.57. b. Worker A s   x x  2 2 n 1 n  2112 7  15.8095  3.98 7 1 6455  Worker B s  x2   x n 1 2 n  2132 7  .9524  .98 7 1 6487  c. The standard deviations represent the amount of variability in the time it takes the worker to complete subtask 1. d. Worker A s   x x  2 2 n 1 n  212 7  .6667  .82 7 1 67  Worker B s e.   x x  2 2 n 1 n  292 7  4.4762  2.12 7 1 147  I would choose workers similar to worker B to perform subtask 1. Worker B has a slightly higher average time on subtask 1 (A: x  30.14 , B: x  30.43 ). However, Worker B has a smaller variability in the time it takes to complete subtask 1 (part b). He or she is more consistent in the time needed to complete the task. I would choose workers similar to Worker A to perform subtask 2. Worker A has a smaller average time on subtask 2 (A: x  3 , B: x  4.14 ). Worker A also has a smaller variability in the time needed to complete subtask 2 (part d). 2.71 a. The unit of measurement of the variable of interest is dollars (the same as the mean and standard deviation). Based on this, the data are quantitative. Copyright © 2018 Pearson Education, Inc. 46 Chapter 2 b. Since no information is given about the shape of the data set, we can only use Chebyshev’s Rule. $900 is 2 standard deviations below the mean, and $2100 is 2 standard deviations above the mean. Using Chebyshev’s Rule, at least 3/4 of the measurements (or 3/4  200 = 150 measurements) will fall between $900 and $2100. $600 is 3 standard deviations below the mean and $2400 is 3 standard deviations above the mean. Using Chebyshev’s Rule, at least 8/9 of the measurements (or 8/9  200  178 measurements) will fall between $600 and $2400. $1200 is 1 standard deviation below the mean and $1800 is 1 standard deviation above the mean. Using Chebyshev’s Rule, nothing can be said about the number of measurements that will fall between $1200 and $1800. $1500 is equal to the mean and $2100 is 2 standard deviations above the mean. Using Chebyshev’s Rule, at least 3/4 of the measurements (or 3/4  200 = 150 measurements) will fall between $900 and $2100. It is possible that all of the 150 measurements will be between $900 and $1500. Thus, nothing can be said about the number of measurements between $1500 and $2100. 2.72 2.73 2.74 Since no information is given about the data set, we can only use Chebyshev’s Rule. a. Nothing can be said about the percentage of measurements which will fall between x  s and x  s . b. At least 3/4 or 75% of the measurements will fall between x  2s and x  2s . c. At least 8/9 or 89% of the measurements will fall between x  3s and x  3s . According to the Empirical Rule: a. Approximately 68% of the measurements will be contained in the interval x  s to x  s . b. Approximately 95% of the measurements will be contained in the interval x  2s to x  2s . c. Essentially all the measurements will be contained in the interval x  3s to x  3s . a. x s2   x  206  8.24 n 25   x x  2 2 n 1 n  2062 25  3.357 25  1 1778  s  3.357  1.83 b. Number of Measurements in Interval Interval Percentage x  s , or (6.41, 10.07) 18 18 / 25  .72 or 72% x  2s , or (4.58, 11.90) 24 24 / 25  .96 or 96% x  3s , or (2.75, 13.73) 25 25 / 25  1.00 or 100% Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data c. The percentages in part b are in agreement with Chebyshev’s Rule and agree fairly well with the percentages given by the Empirical Rule. d. Range  12  5  7 and s  47 Range 7   1.75 4 4 The range approximation provides a satisfactory estimate of s  1.83 from part a. 2.75 Using Chebyshev’s Rule, at least 8/9 of the measurements will fall within 3 standard deviations of the mean. Thus, the range of the data would be around 6 standard deviations. Using the Empirical Rule, approximately 95% of the observations are within 2 standard deviations of the mean. Thus, the range of the data would be around 4 standard deviations. We would expect the standard deviation to be somewhere between Range/6 and Range/4. For our data, the range  760  135  625 . The Range 625 Range 625   156.25 .   104.17 and 6 6 4 4 Therefore, I would estimate that the standard deviation of the data set is between 104.17 and 156.25. It would not be feasible to have a standard deviation of 25. If the standard deviation were 25, the data would span 625/25 = 25 standard deviations. This would be extremely unlikely. 2.76 a. z 263  353  3 A score of 263 would be 3 standard deviations below the mean. 30 z 443  353  3 A score of 443 would be 3 standard deviations above the mean. 30 Using Chebyshev’s Rule, at least 8/9 of the observations will be within 3 standard deviations of the mean. b. For a mound-shaped, symmetric distribution, approximately 99.7% of the observations will be within 3 standard deviations of the mean, using the Empirical Rule. c. z 109  184  3 A score of 109 would be 3 standard deviations below the mean. 25 z 259  184  3 A score of 259 would be 3 standard deviations above the mean. 25 d. 2.77 a. Using Chebyshev’s Rule, at least 8/9 of the observations will be within 3 standard deviations of the mean. For a mound-shaped, symmetric distribution, approximately 99.7% of the observations will be within 3 standard deviations of the mean, using the Empirical Rule. Because the distribution is skewed, we will use Chebyshev’s Rule. At least 8/9 of the observations will be within 3 standard deviations of the mean: x A  3s A  73.62  314.48  73.62  43.44   30.18, 117.06 b. Because the distribution is skewed, we will use Chebyshev’s Rule. At least 8/9 of the observations Copyright © 2018 Pearson Education, Inc. 48 Chapter 2 will be within 3 standard deviations of the mean: x A  3s A  128.54  3 21.97   128.54  65.91   62.63, 194.45 c. Because the distribution is skewed, we will use Chebyshev’s Rule. At least 8/9 of the observations will be within 3 standard deviations of the mean: x A  3s A  83.07  3 20.05  83.07  60.15   22.92, 143.22 2.78 d. Although all the intervals overlap, it appears that weathering group B results in faster decay because the sample mean is higher and the upper limit of the interval is much higher than the upper limit for the other two weathering types. a. Using MINITAB, the histogram of the data is: Histogram of Wheels 12 10 Frequency 8 6 4 2 0 1 2 3 4 5 6 7 8 Wheels Since the distribution is skewed to the right, it is not mound-shaped and it is not symmetric. b. Using MINITAB, the results are: Descriptive Statistics: Wheels Variable Wheels N 28 Mean 3.214 StDev 1.371 Minimum 1.000 Q1 2.000 Median 3.000 Q3 4.000 Maximum 8.000 The mean is 3.214 and the standard deviation is 1.371. 2.79 c. The interval is: x  2 s  3.214  2(1.371)  3.214  2.742  (0.472, 5.956) . d. According to Chebyshev’s rule, at least 75% of the observations will fall within 2 standard deviations of the mean. e. According to the Empirical Rule, approximately 95% of the observations will fall within 2 standard deviations of the mean. f. Actually, 26 of the 28 or 26/28 = .929 of the observations fall within the interval. This value is close to the 95% that we would expect with the Empirical Rule. a. The interval x  2s will contain at least 75% of the observations. This interval is x  2s  3.11  2(.66)  3.11  1.32  (1.79, 4.43) . Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.80 49 b. No. The value 1.25 does not fall in the interval x  2s . We know that at least 75% of all observations will fall within 2 standard deviations of the mean. Since 1.25 falls more than 2 standard deviations from the mean, it would not be a likely value to observe. a. Since the data are mound-shaped and symmetric, we know from the Empirical Rule that approximately 95% of the observations will fall within 2 standard deviations of the mean. This interval will be: x  2s  39  2(6)  39  12  (27, 51) . b. We know that approximately .05 of the observations will fall outside the range 27 to 51. Since the distribution of scores is symmetric, we know that half of the .05 or .025 will fall above 51. c. We know from the Empirical Rule that approximately 99.7% (essentially all) of the observations will fall within 3 standard deviations of the mean. This interval is: x  3s  39  3(6)  39  18  (21, 57) . n 2.81 a. x The sample mean is: x  i 1 n i  18,482  94.78 195 2  n    xi  n 2 18, 4822 x  i 1 1,756,550   n 195  24.9254 The sample variance is: s 2  i 1  195  1 n 1 The standard deviation is: s  s 2  24.9254  4.9925 b. x  s  94.78  4.99  (89.79, 99.77) x  2s  94.78  2(4.99)  94.78  9.98  (84.80, 104.76) x  3s  94.78  3(4.99)  94.78  14.97  (79.81, 109.75) c. There are 143 out of 195 observations in the first interval. This is (143 / 195)  100%  73.3% . There are 189 out of 195 observations in the second interval. This is (189 / 195)  100%  96.9% . There are 191 out of 195 observations in the second interval. This is (191 / 195)  100%  97.9% . The percentages for the first 2 intervals are somewhat larger than we would expect using the Empirical Rule. The Empirical Rule indicates that approximately 68% of the observations will fall within 1 standard deviation of the mean. It also indicates that approximately 95% of the observations will fall within 2 standard deviations of the mean. Chebyshev’s Theorem says that at least ¾ or 75% of the observations will fall within 2 standard deviations of the mean and at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean. It appears that our observed percentages agree with Chebyshev’s Theorem better than the Empirical Rule. 2.82 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Deaths Variable Deaths N 27 Mean 163.4 StDev 227.4 Minimum 4.0 Q1 29.0 Median 68.0 Q3 184.0 Maximum 955.0 Since the data are not mound-shaped, we will use Chebyshev’s Rule. Most of the observations (8/9) will fall within 3 standard deviations of the mean. This interval is: Copyright © 2018 Pearson Education, Inc. 50 Chapter 2 x  3s  163.4  3(227.4)  163.4  682.2  ( 518.8, 845.6) . Since no observations can be negative, then most observations will fall between 0 and 845.6. 2.83 Using MINITAB, the descriptive statistics are: Descriptive Statistics: Q2 Variable Q2 Q1 No Undecided Yes N 1 5 30 Mean 2.0000 4.800 3.967 StDev * 0.447 0.850 Minimum 2.0000 4.000 2.000 Q1 * 4.500 3.000 Median 2.0000 5.000 4.000 Q3 * 5.000 5.000 Maximum 2.0000 5.000 5.000 The data for those users who believe there should be national standards is close to being mound-shaped and symmetric. Therefore, we will use the Empirical Rule. Approximately 95% of the observations fall within 2 standard deviations of the mean. This interval is: x  2s  3.967  2(.85)  3.967  1.70  (2.267, 5.667) 2.84 a. The average ranking for contestants with a first degree who competed for a job with Lord Sugar is 7.796. b. Approximately 95% of the observations will fall within 2 standard deviations of the mean. This interval is: x  2 s  7.796  2(4.231)  7.796  8.462  ( .666, 16.258) Since no observations can be negative, the interval will be 0 to 16.258. 2.85 2.86 c. No. It appears that just the opposite is true. When the prize was a job, the higher the education level of the contestant, the higher the mean ratting. When the prize was a partnership, the higher the education level of the contestant, the lower mean the rating. a. The interval x  2s for the flexed arm group is x  2s  59  3(4)  59  12  (47, 71) . The interval for the extended are group is x  2s  43  3(2)  43  6  (37, 49) . We know that at least 8/9 or 88.9% of the observations will fall within 3 standard deviations of the mean using Chebyshev’s Rule. Since these 2 intervals barely overlap, the information supports the researchers’ theory. The shoppers from the flexed arm group are more likely to select vice options than the extended arm group. b. The interval x  2s for the flexed arm group is x  2s  59  2(10)  59  20  (39, 79) . The interval for the extended are group is x  2 s  43  2(15)  43  30  (13, 73) . Since these two intervals overlap almost completely, the information does not support the researcher’s theory. There does not appear to be any difference between the two groups. a. Yes. The distribution of the buy-side analysts is fairly flat and skewed to the right. The distribution of the sell-side analysts is more mound shaped and is not spread out as far as the buy-side distribution. Since the buy-side distribution is more spread out, the variance of the buy-side distribution will be larger than the variance of the sell-side distribution. Because the buy-side distribution is skewed to the right, the mean will be pulled to the right. Thus, the mean of the buyside distribution will be greater than the mean of the sell-side distribution. b. Since the sell-side distribution is fairly mound-shaped, we can use the Empirical Rule. The Empirical Rule says that approximately 95% of the observations will fall within 2 standard deviations of the mean. The interval for the sell-side distribution would be: Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 51 x  2 s  .05  2(.85)  .05  1.7  (1.75, 1.65) Since the buy-side distribution is skewed to the right, we cannot use the Empirical Rule. Thus, we will use Chebyshev’s Rule. We know that at least (1 – 1/k2) will fall within k standard deviations of the mean. If we choose k  4 , then (1  1 / 4 2 )  .9375 or 93.75%. This is very close to 95% requested in the problem. The interval for the buy-side distribution to contain at least 93.75% of the observations would be: x  4s  .85  4(1.93)  .85  7.72  (6.87, 8.57) Note: This interval will contain at least 93.75% of the observations. It may contain more than 93.75% of the observations. 2.87 Since we do not know if the distribution of the heights of the trees is mound-shaped, we need to apply Chebyshev’s Rule. We know   30 and   3 . Therefore,   3  30  3(3)  30  9  (21, 39) . According to Chebyshev’s Rule, at least 8 / 9  .89 of the tree heights on this piece of land fall within this interval and at most 1/ 9  .11 of the tree heights will fall above the interval. However, the buyer will only 1000 purchase the land if at least  .20 of the tree heights are at least 40 feet tall. Therefore, the buyer 5000 should not buy the piece of land. 2.88 a. Since we do not have any idea of the shape of the distribution of SAT-Math score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x  3s  19  3(65)  19  195  (176, 214) Thus, for a randomly selected student, we could be pretty sure that this student’s score would be anywhere from 176 points below his/her previous SAT-Math score to 214 points above his/her previous SAT-Math score. b. Since we do not have any idea of the shape of the distribution of SAT-Verbal score changes, we must use Chebyshev’s Theorem. We know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. This interval would be: x  3s  7  3(49)  7  147  (140, 154) Thus, for a randomly selected student, we could be pretty sure that this student’s score would be anywhere from 140 points below his/her previous SAT-Verbal score to 154 points above his/her previous SAT-Verbal score. c. 2.89 A change of 140 points on the SAT-Math would be a little less than 2 standard deviations from the mean. A change of 140 points on the SAT-Verbal would be a little less than 3 standard deviations from the mean. Since the 140 point change for the SAT-Math is not as big a change as the 140 point on the SAT-Verbal, it would be most likely that the score was a SAT-Math score. We know   25 and   1 . Therefore,   2  25  2(.1)  25  .2  (24.8, 25.2) The machine is shut down for adjustment if the contents of two consecutive bags fall more than 2 standard deviations from the mean (i.e., outside the interval (24.8, 25.2)). Therefore, the machine was shut down yesterday at 11:30 (25.23 and 25.25 are outside the interval) and again at 4:00 (24.71 and 25.31 are outside the interval). Copyright © 2018 Pearson Education, Inc. 52 Chapter 2 2.90 a. z b. z c. z d. z 2.91 x  x 40  30   2 (sample) s 5 x  x  2 standard deviations above the mean.  90  89  .5 (population) 2 .5 standard deviations above the mean.  50  50  0 (population) 5 0 standard deviations above the mean. x  x 20  30   2.5 (sample) s 4 2.5 standard deviations below the mean. Using the definition of a percentile: a. Percentile 75th Percentage Above 25% Percentage Below 75% b. 50th 50% 50% c. 20th 80% 20% d. 84th 16% 84% 2.92 QL corresponds to the 25th percentile. QM corresponds to the 50th percentile. QU corresponds to the 75th percentile. 2.93 We first compute z-scores for each x value. a. z b. z c. z d. z x  x  x  x   100  50 2 25  1 4  3 1  0  200 2 100  10  5  1.67 3 The above z-scores indicate that the x value in part a lies the greatest distance above the mean and the x value of part b lies the greatest distance below the mean. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.94 53 Since the element 40 has a z-score of 2 and 90 has a z-score of 3, 2  40    and  2  40      2  40    40  2 3 90     3  90      3  90 By substitution, 40  2  3  90  5  50    10 and   40  2(10)  60 . Therefore, the population mean is 60 and the standard deviation is 10. 2.95 The mean score of U.S. eighth-graders on a mathematics assessment test is 282. This is the average score. The 25th percentile is 258. This means that 25% of the U.S. eighth-graders score below 258 on the test and 75% score higher. The 75th percentile is 308. This means that 75% of the U.S. eighth-graders score below 308 on the test and 25% score higher. The 90th percentile is 329. This means that 90% of the U.S. eighthgraders score below 329 on the test and 10% score higher. 2.96 a. z x  x 400  353   1.57 A transformer with 400 sags in a week is 1.57 standard deviations above s 30 the mean. b. 2.97 2.98 z x  x 100  184   3.36 A transformer with 100 swells in a week is 3.36 standard deviations s 25 below the mean. A mean current salary of $57,000 indicates that the average current salary of the University of South Florida graduates is $57,000. At mid-career, half of the University of South Florida graduates had a salary less than $48,000 and half had salaries greater than $48,000. At mid-career, 90% of the University of South Florida graduates had salaries under $131,000 and 10% had salaries greater than $131,000. a. From Exercise 2.81, x  94.78 and s  4.99 . The z-score for an observation of 73 is: z x  x 73  94.78   4.36 s 4.99 This z-score indicates that an observation of 73 is 4.36 standard deviations below the mean. Very few observations will be lower than this one. b. The z-score for an observation of 91 is: z x  x 91  94.78   0.76 s 4.99 This z-score indicates that an observation of 91 is .76 standard deviations below the mean. This score is not an unusual observation in the data set. 2.99 Since the 90th percentile of the study sample in the subdivision was .00372 mg/L, which is less than the USEPA level of .015 mg/L, the water customers in the subdivision are not at risk of drinking water with unhealthy lead levels. Copyright © 2018 Pearson Education, Inc. 54 2.100 2.101 Chapter 2 x  x 155  67.755   3.25 . This score would not be s 26.871 considered a typical level of support. It is 3.25 standard deviations above the mean. Very few observations would be above this value. The z-score associated with a score of 155 is z  The average ROE is 13.93. The median ROE is 14.86, meaning 50% of firms have ROE below 14.86. The 5th percentile is -19.64 meaning 5% of firms have ROE below –19.64. The 25th percentile is 7.59 meaning 25% of firms have ROE below 7.59. The 75th percentile is 21.32 meaning 75% of firms have ROE below 21.32. The 95th percentile is 38.42 meaning 95% of firms have ROE below 38.42. The standard deviation is 21.65. Most observations will fall within 2s or 43.30 units of mean. The distribution will be somewhat skewed to the left as the 5th percentile value is much further from the median than the 95th percentile value. 2.102 a. Since the data are approximately mound-shaped, we can use the Empirical Rule. On the blue exam, the mean is 53% and the standard deviation is 15%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x  s  53  15  (38, 68) About 95% of all students will score within 2 standard deviations of the mean. This interval is: x  2s  53  2(15)  53  30  (23, 83) About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: x  3s  53  3(15)  53  45  (8, 98) b. Since the data are approximately mound-shaped, we can use the Empirical Rule. On the red exam, the mean is 39% and the standard deviation is 12%. We know that approximately 68% of all students will score within 1 standard deviation of the mean. This interval is: x  s  39  12  (27, 51) About 95% of all students will score within 2 standard deviations of the mean. This interval is: x  2 s  39  2(12)  39  24  (15, 63) About 99.7% of all students will score within 3 standard deviations of the mean. This interval is: x  3s  39  3(12)  39  36  (3, 75) 2.103 c. The student would have been more likely to have taken the red exam. For the blue exam, we know that approximately 95% of all scores will be from 23% to 83%. The observed 20% score does not fall in this range. For the red exam, we know that approximately 95% of all scores will be from 15% to 63%. The observed 20% score does fall in this range. Thus, it is more likely that the student would have taken the red exam. a. The z-score for Harvard is z = 5.08. This means that Harvard’s productivity score was 5.08 standard deviations above the mean. This is extremely high and extremely unusual. b. The z-score for Howard University is z = .85. This means that Howard University’s productivity score was .85 standard deviations below the mean. This is not an unusual z-score. c. Yes. Other indicators that the distribution is skewed to the right are the values of the highest and lowest z-scores. The lowest z-score is less than 1 standard deviation below the mean while the highest z-score is 5.08 standard deviations above the mean. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 55 Using MINITAB, the histogram of the z-scores is: Histogram of Z-Score 70 60 Frequency 50 40 30 20 10 0 -1 0 1 2 Z-Score 3 4 5 This histogram does imply that the data are skewed to the right. 2.104 a. From the problem,   2.7 and   .5 z x   z  x    x    z For z = 2.0, x  2.7  2.0(.5)  3.7 For z = 1.0, x  2.7  1.0(.5)  2.2 For z = .5, x  2.7  .5(.5)  2.95 For z = 2.5, x  2.7  2.5(.5)  1.45 b. For z = 1.6, x  2.7  1.6(.5)  1.9 c. If we assume the distribution of GPAs is approximately mound-shaped, we can use the Empirical Rule. From the Empirical Rule, we know that .025 or 2.5% of the students will have GPAs above 3.7 (with z = 2). Thus, the GPA corresponding to summa cum laude (top 2.5%) will be greater than 3.7 (z > 2). We know that .16 or 16% of the students will have GPAs above 3.2 (z = 1). Thus, the limit on GPAs for cum laude (top 16%) will be greater than 3.2 (z > 1). Copyright © 2018 Pearson Education, Inc. 56 Chapter 2 We must assume the distribution is mound-shaped. 2.105 Not necessarily. Because the distribution is highly skewed to the right, the standard deviation is very large. Remember that the z-score represents the number of standard deviations a score is from the mean. If the standard deviation is very large, then the z-scores for observations somewhat near the mean will appear to be fairly small. If we deleted the schools with the very high productivity scores and recomputed the mean and standard deviation, the standard deviation would be much smaller. Thus, most of the z-scores would be larger because we would be dividing by a much smaller standard deviation. This would imply a bigger spread among the rest of the schools than the original distribution with the few outliers. 2.106 To determine if the measurements are outliers, compute the z-score. a. b. c. d. 2.107 z x  x 65  57   .727 s 11 Since the z-score is less than 3, this would not be an outlier. x  x 21  57   3.273 Since the z-score is greater than 3 in absolute value, this would be an s 11 outlier. z z x  x 72  57   1.364 Since the z-score is less than 3, this would not be an outlier. s 11 x  x 98  57   3.727 Since the z-score is greater than 3 in absolute value, this would be an s 11 outlier. z The interquartile range is IQR  QU  QL  85  60  25 . The lower inner fence = QL  1.5( IQR )  60  1.5(25)  22.5 . The upper inner fence = QU  1.5( IQR )  85  1.5(25)  122.5 . The lower outer fence = QL  3( IQR )  60  3(25)  15 . The upper outer fence = QU  3( IQR )  85  3(25)  160 . With only this information, the box plot would look something like the following: * ──────────── ──────────────────│ + │────── ──────────── ─┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼─── 10 20 30 40 50 60 70 80 90 100 110 The whiskers extend to the inner fences unless no data points are that small or that large. The upper inner fence is 122.5. However, the largest data point is 100, so the whisker stops at 100. The lower inner fence is 22.5. The smallest data point is 18, so the whisker extends to 22.5. Since 18 is between the inner and outer fences, it is designated with a *. We do not know if there is any more than one data point below 22.5, so we cannot be sure that the box plot is entirely correct. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.108 a. Median is approximately 4. b. QL is approximately 3 (Lower Quartile) 57 QU is approximately 6 (Upper Quartile) 2.109 c. IQR  QU  QL  6  3  3 d. The data set is skewed to the right since the right whisker is longer than the left, there is one outlier, and there are two potential outliers. e. 50% of the measurements are to the right of the median and 75% are to the left of the upper quartile. f. The upper inner fence is QU  1.5( IQR)  6  1.5(3)  10.5 . The upper outer fence is QU  3( IQR )  6  3(3)  15 . Thus, there are two suspect outliers, 12 and 13. There is one highly suspect outlier, 16. a. Using MINITAB, the box plots for samples A and B are: Boxplot of Sample A, Sample B Sample A Sample B 100 125 150 175 200 Data b. In sample A, the measurement 84 is an outlier. This measurement falls outside the lower outer fence. Lower outer fence = Lower hinge 3( IQR )  150  3(172  150)  150  3(22)  84 Lower inner fence = Lower hinge 1.5( IQR)  150  1.5(22)  117 Upper inner fence = Upper hinge 1.5( IQR )  172  1.5(22)  205 In addition, 100 may be an outlier. It lies outside the inner fence. In sample B, 140 and 206 may be outliers. The point 140 lies outside the inner fence while the point 206 lies right at the inner fence. Lower outer fence = Lower hinge 3( IQR )  168  3(184  169)  168  3(15)  123 Lower inner fence = Lower hinge 1.5( IQR)  168  1.5(15)  145.5 Upper inner fence = Upper hinge 1.5( IQR )  184  1.5(15)  206.5 Copyright © 2018 Pearson Education, Inc. 58 Chapter 2 2.110 a. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Acad Rep Score Variable Acad Rep Score N 50 Mean 76.42 Minimum 47.00 Q1 64.75 Median 76.00 Q3 89.00 Maximum 100.00 IQR 24.25 The median is 76, the lower quartile is 64.75, and the upper quartile is 89. b. IQR  QU  QL  89  64.75  24.25 c. Using MINITAB, the boxplot is: Boxplot of Academic Rep Score 40 50 60 70 80 90 100 Academic Rep Score d. Suspect outliers lie between QL  1.5 IQR  and QL  3 IQR  or between QU  1.5  IQR  and QU  3 IQR  . QL  1.5  IQR   64.75  1.5  24.25  28.375 , QL  3 IQR   64.75  3 24.25  8 QU  1.5  IQR   89  1.5  24.25  125.375 , QU  3 IQR   89  3 24.25  161.75 No scores are less than 28.375 nor larger than 125.375. Therefore, there are no outliers or suspect outliers. 2.111 a. z x  x 400  353   1.57 Since the z-score is less than 2, 400 sags per week would not be s 30 considered unusual. b. z x  x 100  184   3.36 Since the absolute value of the z-score is greater than 3, 100 swells per s 25 week would be considered unusual. 2.112 2.113 a. The approximate 25th percentile PASI score before treatment is 10. The approximate median before treatment is 15. The approximate 75th percentile PASI score before treatment is 28. b. The approximate 25th percentile PASI score after treatment is 3. The approximate median after treatment is 5. The approximate 75th percentile PASI score after treatment is 7.5. c. Since the 75th percentile after treatment is lower than the 25th percentile before treatment, it appears that the ichthyotherapy is effective in treating psoriasis. a. The average expenditure per full-time employee is $6,563. The median expenditure per employee is $6,232. Half of all expenditures per employee were less than $6,232 and half were greater than $6,232. The lower quartile is $5,309. Twenty-five percent of all expenditures per employee were Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 59 below $5,309. The upper quartile is $7,216. Seventy-five percent of all expenditures per employee were below $7,216. 2.114 b. IQR  QU  QL  $7, 216  $5,309  $1,907 . c. The interquartile range goes from the 25th percentile to the 75th percentile. Thus, .5  .75  .25 of the 1,751 army hospitals have expenses between $5,309 and $7,216. a. From the printout, x  52.334 and s = 9.224. The highest salary is 75 (thousand). The z-score is z  x  x 75  52.334   2.46 s 9.224 Therefore, the highest salary is 2.46 standard deviations above the mean. The lowest salary is 35.0 (thousand). The z-score is z  x  x 35.0  52.334   1.88 s 9.224 Therefore, the lowest salary is 1.88 standard deviations below the mean. The mean salary offer is 52.33 (thousand). The z-score is z  x  x 52.33  52.334  0 s 9.224 The z-score for the mean salary offer is 0 standard deviations from the mean. No, the highest salary offer is not unusually high. For any distribution, at least 8/9 of the salaries should have z-scores between 3 and 3. A z-score of 2.46 would not be that unusual. Since no salaries are outside the inner fences, none of them are suspect or highly suspect outliers. a. Using MINITAB, the boxplots for each type of firm are: Boxplot of Time Joint Votes 2.115 b. None Prepack 0 2 4 6 8 10 Time b. The median bankruptcy time for No prefiling firms is about 3.2. The median bankruptcy time for Copyright © 2018 Pearson Education, Inc. 60 Chapter 2 Joint firms is about 1.5. The median bankruptcy time for Prepack firms is about 1.4. 2.116 c. The range of the “Prepack” firms is less than the other two, while the range of the “None” firms is the largest. The interquartile range of the “Prepack” firms is less than the other two, while the interquartile range of the “Joint” firms is larger than the other two. d. No. The interquartile range for the “Prepack” firms is the smallest which corresponds to the smallest standard deviation. However, the second smallest interquartile range corresponds to the “None” firms. The second smallest standard deviation corresponds to the “Joint” firms. e. Yes. There is evidence of two outliers in the “Prepack” firms. These are indicated by the two *’s. There is also evidence of two outliers in the “None” firms. These are indicated by the two *’s. From Exercise 2.100, x  67.755 and s  26.87 . Using MINITAB, a boxplot of the data is: Boxplot of Support 0 20 40 60 80 100 120 140 160 Support From the boxplot, the support level of 155 would be an outlier. From Exercise 2.100, we found the z-score x  x 155  67.755 associated with a score of 155 as z    3.25 . Since this z-score is greater than 3, the s 26.871 observation 155 is considered an outlier. 2.117 a. Using MINITAB, the boxplot is: Boxplot of SCORE 70 75 80 85 90 95 100 SCORE From the boxplot, there appears to be 4 outliers: 69, 73, 76, and 78. b. From Exercise 2.81, x  94.78 and s  4.99 . Since the data are skewed to the left, we will consider observations more than 2 standard deviations from the mean to be outliers. An observation with a zscore of 2 would have the value: Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data z 61 xx x  94.78 2  2(4.99)  x  94.78  9.98  x  94.78  x  104.76 s 4.99 An observation with a z-score of -2 would have the value: z xx x  94.78  2   2(4.99)  x  94.78  9.98  x  94.78  x  84.80 s 4.99 Observations greater than 104.76 or less than 84.80 would be considered outliers. Using this criterion, the following observations would be outliers: 69, 73, 76, and 78. 2.118 c. Yes, these methods do not agree exactly. Using the boxplot, 4 observations were identified as outliers. Using the z-score method, 4 observations were also identified as outliers. a. Using MINITAB, the box plot is: Boxplot of Downtime 0 10 20 30 40 50 60 70 Downtime The median is about 18. The data appear to be skewed to the right since there are 3 suspect outliers to the right and none to the left. The variability of the data is fairly small because the IQR is fairly small, approximately 26  10 = 16. b. The customers associated with the suspected outliers are customers 268, 269, and 264. c. In order to find the z-scores, we must first find the mean and standard deviation. x  x  815  20.375 n 40 s2   x2   x n 1 n 2 s  192.90705  13.89 The z-scores associated with the suspected outliers are: Customer 268 z  49  20.375  2.06 13.89 Customer 269 z  50  20.375  2.13 13.89 2 24129  815 40  192.90705  40  1 Copyright © 2018 Pearson Education, Inc. 62 Chapter 2 Customer 264 z  64  20.375  3.14 13.89 All the z-scores are greater than 2. These are unusual values. 2.119 Using MINITAB, the boxplots of the data are: Boxplot of PermA, PermB, PermC PermA PermB PermC 50 75 100 125 150 Data The descriptive statistics are: Descriptive Statistics: PermA, PermB, PermC Variable PermA PermB PermC 2.120 N 100 100 100 Mean 73.62 128.54 83.07 StDev 14.48 21.97 20.05 Minimum 55.20 50.40 52.20 Q1 62.00 108.65 67.72 Median 70.45 139.30 78.65 Q3 81.42 147.02 95.35 Maximum 122.40 150.00 129.00 IQR 19.42 38.37 27.63 a. For group A, the suspect outliers are any observations greater than QU  1.5  IQR   81.42  1.5 19.42  110.55 or less than QL  1.5  IQR   62  1.5 19.42  32.87 . There are 3 observations greater than 110.55: 117.3, 118.5, and 122.4. b. For group B, the suspect outliers are any observations greater than QU  1.5  IQR   147.02  1.5  38.37   204.575 or less than QL  1.5 IQR   108.65  1.5  38.37   51.095 . There is 1 observation less than 51.095: 50.4. c. For group C, the suspect outliers are any observations greater than QL  1.5  IQR   95.35  1.5  27.63  136.795 or less than QL  1.5  IQR   67.72  1.5  27.63  26.275 . No observations are greater than 136.795 or less than 26.275. d. For group A, if the outliers are removed, the mean will decrease, the median will slightly decrease, and the standard deviation will decrease. For group B, if the outlier is removed, the mean will increase, the median will slightly increase, and the standard deviation will decrease. For Perturbed Intrinsics, but no Perturbed Projections: 2 n x x  i 1 n i  8.1  1.62 5  n    xi  n  i 1  8.12 2 x  15.63   i n 5  2.508  .627 s 2  i 1  n 1 5 1 4 The z-score corresponding to a value of 4.5 is z  x  x 4.5  1.62   3.63 s .792 Copyright © 2018 Pearson Education, Inc. s  s 2  .627  .792 Methods for Describing Sets of Data 63 Since this z-score is greater than 3, we would consider this an outlier for perturbed intrinsics, but no perturbed projections. For Perturbed Projections, but no Perturbed Intrinsics: 2 n x x  i 1 n i  125.8  25.16 5  n    xi  n  i 1  125.82 2  x 3350.1   i n 5  184.972  46.243 s 2  i 1  n 1 5 1 4 s  s 2  46.243  6.800 The z-score corresponding to a value of 4.5 is z  x  x 4.5  25.16   3.038 s 6.800 Since this z-score is less than -3, we would consider this an outlier for perturbed projections, but no perturbed intrinsics. Since the z-score corresponding to 4.5 for the perturbed projections, but no perturbed intrinsics is smaller in absolute value than that for perturbed intrinsics, but no perturbed projections, it is more likely that the that the type of camera perturbation is perturbed projections, but no perturbed intrinsics. 2.121 From the stem-and-leaf display in Exercise 2.34, the data are fairly mound-shaped, but skewed somewhat to the right. The sample mean is x   x  1493  59.72 . n The sample variance is s 2  25   x x  2 2 n 1 n  14932 25  321.7933 . 25  1 96,885  The sample standard deviation is s  321.7933  17.9386 . The z-score associated with the largest value is z  x  x 102  59.72   2.36 . s 17.9386 This observation is a suspect outlier. The observations associated with the one-time customers are 5 of the largest 7 observations. Thus, repeat customers tend to have shorter delivery times than one-time customers. Copyright © 2018 Pearson Education, Inc. 64 2.122 Chapter 2 Using MINITAB, a scatterplot of the data is: Scatterplot of Var 2 vs Var 1 14 12 10 Var 2 8 6 4 2 0 0 2 4 6 8 Var 1 2.123 Using MINITAB, the scatterplot is: Scatterplot of Var 2 vs Var 1 18 16 14 Var 2 12 10 8 6 4 2 0 1 2 3 4 5 Var 1 Using MINITAB, the scatterplot is: Scatterplot of RATIO vs DIAMETER 10.0 9.5 9.0 RATIO 2.124 8.5 8.0 7.5 7.0 6.5 0 100 200 300 400 500 600 700 DIAMETER It appears that as the pipe diameter increases, the ratio of repair to replacement cost increases. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 65 2.125. From the scatterplot of the data, it appears that as the number of punishments increases, the average payoff decreases. Thus, there appears to be a negative linear relationship between punishment use and average payoff. This supports the researchers conclusion that “winners” don’t punish”. 2.126 Using MINITAB, the scatterplot of the data is: Scatterplot of Catch vs Search 7000 Catch 6000 5000 4000 3000 15 20 25 30 35 Search There is an apparent negative linear trend between the search frequency and the total catch. As the search frequency increases, the total catch tends to decrease. Using MINITAB, a scattergram of the data is: Scatterplot of SLUGPCT vs ELEVATION 0.625 0.600 0.575 SLUGPCT 2.127 0.550 0.525 0.500 0.475 0.450 0 1000 2000 3000 4000 5000 6000 ELEVATION If we include the observation from Denver, then we would say there might be a linear relationship between slugging percentage and elevation. If we eliminated the observation from Denver, it appears that there might not be a relationship between slugging percentage and elevation. Copyright © 2018 Pearson Education, Inc. 66 2.128 Chapter 2 Using MINITAB, the scatterplot of the data is: Scatterplot of MATH2014 vs MATH2010 625 600 MATH2014 575 550 525 500 475 450 450 475 500 525 550 575 600 625 MATH2010 There appears to be a positive linear trend between the Math SAT scores in 2010 and the Math SAT scores in 2014. As the 2010 Math SAT scores increase, the 2014 Math SAT scores also tend to increase. 2.129 Using MINITAB, the scatterplot of the data is: Scatterplot of Number vs Hour 400 350 Number 300 250 200 150 100 0 2 4 6 8 10 12 Hour There appears to be a positive linear trend to the data. As the hours increase, the number of accidents tends to increase. Using MINITAB, the scatterplot of the data is: Scatterplot of Mass vs Time 7 6 5 4 Mass 2.130 3 2 1 0 0 10 20 30 40 50 60 Time There is evidence to indicate that the mass of the spill tends to diminish as time increases. As time is getting larger, the mass is decreasing. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.131 a. 67 Using MINITAB, a scatterplot of the data is: Scatterplot of Year2 vs Year1 55 50 Year2 45 40 35 30 20 30 40 50 60 Year1 There is a moderate positive trend to the data. As the scores for Year1 increase, the scores for Year2 also tend to increase. b. 2.132 From the graph, two agencies that had greater than expected PARS evaluation scores for Year2 were USAID and State. Using MINITAB, the scatterplot of the data is: Scatterplot of VALUE vs OPINCOME 4000 3500 VALUE 3000 2500 2000 1500 0 50 100 150 200 250 300 OPINCOME There is a positive trend to the data. As operating income increases, the 2015value also tends to increase. Since the trend is positive, we would recommend that an NFL executive use operating income to predict a team’s current value. a. Using MINITAB, the scatterplot of the data is: Scatterplot of Ratio vs Age 2000 1750 1500 Ratio 2.133 1250 1000 750 500 45 50 55 60 65 70 75 80 Age There appears to be a weak, negative relationship between a CEO’s ratio of salary to worker pay and the CEO’s age. Copyright © 2018 Pearson Education, Inc. 68 Chapter 2 b. Using MINITAB the descriptive statistics are: Descriptive Statistics: Ratio Variable Ratio N 40 Mean 641.8 StDev 314.8 Minimum 415.0 Q1 481.0 Median 536.5 Q3 660.8 Maximum 1951.0 IQR 179.8 Using the interquartile range, the highly suspect outliers are any observations greater than QU  3 IQR   660.8  3179.8  1,200.2 or less than QL  3 IQR   481.0  3179.8  58.4 . There are 2 highly suspect outliers: 1,522 and 1,951. Using the z-score, any observation greater than 3 standard deviations above or below the mean are highly suspect outliers. Three standard deviations above the mean is: xx x  641.8 z 3  3(314.8)  x  641.8  944.4  x  641.8  x  1,586.2 314.8 s Three standard deviations below the mean is: xx x  641.8 z  3   3(314.8)  x  641.8  944.4  x  641.8  x  302.6 314.8 s Using this method, there is one highly suspect outlier: 1,951. c. Removing the observation 1,951, the scatterplot of the data is: Scatterplot of Ratio-Remove1 vs Age-Remove1 1500 Ratio-Remove1 1250 1000 750 500 45 50 55 60 65 70 75 80 Age-Remove1 By removing the one highly suspect outlier, the relationship is still negative, but it is a stronger, negative relationship. Using MINITAB, a scatterplot of the data is: Scatterplot of ACCURACY vs DISTANCE 75 70 65 ACCURACY 2.134 60 55 50 45 280 290 300 310 320 DISTANCE Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 69 Yes, his concern is a valid one. From the scatterplot, there appears to be a fairly strong negative relationship between accuracy and driving distance. As driving distance increases, the driving accuracy tend to decrease. 2.135 One way the bar graph can mislead the viewer is that the vertical axis has been cut off. Instead of starting at 0, the vertical axis starts at 12. Another way the bar graph can mislead the viewer is that as the bars get taller, the widths of the bars also increase. 2.136 a. If you work for Volkswagon, you would choose to use the median number of deaths because this is much lower than the mean. The data are skewed to the right, so the median would probably be a better representation of the middle of the distribution. b. If you support an environmental watch group, you would choose to use the mean number of deaths because this is much greater than the median. The average number of deaths is much high than the median number of deaths. a. The graph might be misleading because the scales on the vertical axes are different. The left vertical axis ranges from 0 to $120 million. The right vertical axis ranges from 0 to $20 billion. b. Using MINITAB, the redrawn graph is: 2.137 Time Series Plot of Craigslist, NewspaperAds 18000 Variable Craigslist NewspaperAds 16000 14000 Data 12000 10000 8000 6000 4000 2000 0 2003 2004 2005 2006 2007 2008 2009 Year Although the amount of revenue produced by Craigslist has increased dramatically from 2003 to 2009, it is still much smaller than the revenue produced by newspaper ad sales. 2.138 a. This graph is misleading because it looks like as the days are increasing, the number of barrels collected per day are also increasing. However, the bars are the cumulative number of barrels collected. The cumulative value can never decrease. Copyright © 2018 Pearson Education, Inc. 70 Chapter 2 b. Using MINITAB, the graph of the daily collection of oil is: Chart of Barrells 2500 Barrells 2000 1500 1000 500 0 May-16 May-17 May-18 May-19 May-20 Day May-21 May-22 May-23 From this graph, it shows that there has not been a steady improvement in the suctioning process. There was an increase for 3 days, then a leveling off for 3 days, then a decrease. 2.139 The relative frequency histogram is: Histogram of Class Relative frequency .20 .15 .10 .05 0 1.125 2.625 4.125 5.625 Measurement Class 7.125 8.625 2.140 The mean is sensitive to extreme values in a data set. Therefore, the median is preferred to the mean when a data set is skewed in one direction or the other. 2.141 a. z b. z c z d. z x  x  x  x   50  60  1 10 z 70  60 1 10 z 80  60 2 10  50  50 0 5 z 70  50 4 5 z 80  50 6 5  50  40 1 10 z 70  40 3 10 z 80  40 4 10  50  40  .1 100 z 70  40  .3 100 z 80  40  .4 100 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.142 2.143 a. If we assume that the data are about mound-shaped, then any observation with a z-score greater than 3 in absolute value would be considered an outlier. From Exercise 2.139, the z-score corresponding to 50 is 1, the z-score corresponding to 70 is 1, and the z-score corresponding to 80 is 2. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers. b. From Exercise 2.139, the z-score corresponding to 50 is 2, the z-score corresponding to 70 is 2, and the z-score corresponding to 80 is 4. Since the z-score corresponding to 80 is greater than 3, 80 would be considered an outlier. c. From Exercise 2.139, the z-score corresponding to 50 is 1, the z-score corresponding to 70 is 3, and the z-score corresponding to 80 is 4. Since the z-scores corresponding to 70 and 80 are greater than or equal to 3, 70 and 80 would be considered outliers. d. From Exercise 2.139, the z-score corresponding to 50 is .1, the z-score corresponding to 70 is .3, and the z-score corresponding to 80 is .4. Since none of these z-scores is greater than 3 in absolute value, none would be considered outliers. a.  x  13  1  10  3  3  30 x b. c. x n s2  5   2  x x  n 7 x n  12 3 4 2  s2  x n  34  5.67 6 2 n 2 2 2 302 5  108  27 5 1 4 288   2  x x  s2   2 s  27  5.20 2 2 252 4  84.75  28.25 4 1 3 241  2 n 1 n  s  28.25  5.32  x  1  0  1  10  11  11  15  569 . 2  x x  2 n 1 n 2 2 2 2 2 492 7  226  37.67 7 1 6 569  2  2 2 s  37.67  6.14  x  3  3  3  3  36 2 s2   s2  2 2  x x  2 2 2 n 1  x  4  6  6  5  6  7  34 x 2  x  13  6  6  0  241 25  6.25 4  x  49  7 2 2 n 1  x  3  3  3  3  12 x a. n 2  x  1  0  1  10  11  11  15  49 x d.  x  30  6  x  13  1  10  3  3  288  x  13  6  6  0  25 x 2.144 71 n  2 122 4  0 0 4 1 3 36  s 0 0  x  4  6  6  5  6  7  198  2 2  x x  2 2 2 n 1 n  2 2 2 2 342 6  5.3333  1.0667 6 1 5 198  Copyright © 2018 Pearson Education, Inc. s  1.067  1.03 72 Chapter 2  x  1  4  (3)  0  (3)  (6)  9 b. x  x  9  $1.5 n s2  6   x  (1)  4  (3)  0  (3)  (6)  71 2  x x  2 2 2 n 1 n  2 2 2 2 2 (9) 2 6  57.5  11.5 dollars squared 6 1 5 71  s  11.5  $3.39 x  c. x s2  3 4 2 1 1      2.0625 5 5 5 5 16 2 2 2 2 2 2 3  4  2 1  1   x   5    5    5    5    16   1.2039  x  2.0625  .4125% n 5   x x  2 2 n 1 n  2.06252 .3531 5   .0883% squared 5 1 4 1.2039  s  .0883  .30% d. (a) Range = 7  4 = 3 (b) Range = $4  ($-6) = $10 (c) Range = 4 1 64 5 59 % %  % %  %  .7375% 5 16 80 80 80 2.145 The range is found by taking the largest measurement in the data set and subtracting the smallest measurement. Therefore, it only uses two measurements from the whole data set. The standard deviation uses every measurement in the data set. Therefore, it takes every measurement into account—not just two. The range is affected by extreme values more than the standard deviation. 2.146  2.147 Using MINITAB, the scatterplot is: range 20  5 4 4 Scatterplot of Var 2 vs Var 1 30 Var 2 25 20 15 10 100 200 300 400 500 Var 1 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 2.148 2.149 73 x  x 30  39   1.5 . A score of 30 is 1.5 standard deviations below the mean. s 6 a. The z-score is z  b. Since the data are mound-shaped and symmetric and 39 is the mean, .5 of the sampled drug dealers will have WR scores below 39. c. If 5% of the drug dealers have WR scores above 49, then 95% will have WR scores below 49. Thus, 49 will be the 95th percentile. a. Using MINITAB, the pie chart is: Pie Chart of Blog/Forum Category Company Employees Third Party Not Identified Not Identified 15.4% Company 38.5% Third Party 11.5% Employees 34.6% Companies and Employees represent (38.5 + 34.6 = 73.1) slightly more than 73% of the entities creating blogs/forums. Third parties are the least common entity. b. Using Chebyshev’s Rule, at least 75% of the observations will fall within 2 standard deviations of the mean. x  2 s  4.25  2(12.02)  4.25  24.04  ( 19.79, 28.29) or (0, 28.29) since we cannot have a negative number blogs. 2.150 c. We would expect the distribution to be skewed to the right. We know that we cannot have a negative number of blogs/forums. Even 1 standard deviation below the mean is a negative number. We would assume that there are a few very large observations because the standard deviation is so big compared to the mean. a. To find relative frequencies, we divide the frequencies of each category by the total number of incidents. The relative frequencies of the number of incidents for each of the cause categories are: Management System Cause Category Engineering & Design Procedures & Practices Management & Oversight Training & Communication TOTAL Number of Incidents Relative Frequencies 27 24 22 10 83 27 / 83 = .325 24 / 83 = .289 22 / 83 = .265 10 / 83 = .120 1 Copyright © 2018 Pearson Education, Inc. 74 Chapter 2 b. The Pareto diagram is: Management Systen Cause Category 35 30 P er cent 25 20 15 10 5 0 2.151 E ng&D es P roc&P ract M gmt&O v er C ategor y Trn&C omm c. The category with the highest relative frequency of incidents is Engineering and Design. The category with the lowest relative frequency of incidents is Training and Communication. a. The relative frequency for each response category is found by dividing the frequency by the total sample size. The relative frequency for the category “Global Marketing” is 235/2863 = .082. The rest of the relative frequencies are found in a similar manner and are reported in the table. Area Global Marketing Sales Management Buyer Behavior Relationships Innovation Marketing Strategy Channels/Distribution Marketing Research Services TOTAL Number 235 494 478 498 398 280 213 131 136 2,863 Relative Frequencies 235/2863 = .082 494/2863 = .173 478/2863 = .167 498/2863 = .174 398/2863 = .139 280/2863 = .098 213/2863 = .074 131/2863 = .046 136/2863 = .048 1.00 Relationships and sales management had the most articles published with 17.4% and 17.3%, respectively. Not far behind was Buyer Behavior with 16.7%. Of the rest of the areas, only innovation had more than 10%. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data b. 75 Using MINITAB, the pie chart of the data is: Pie Chart of Area Services Marketing research 4.8% 4.6% Global Marketing 8.2% Channells/Distribution 7.4% Sales Management 17.3% Marketing Strategy 9.8% Inovation 13.9% Category Global Marketing Sales Management Buyer Behavior Relationships Inovation Marketing Strategy Channells/Distribution Marketing research Services Buyer Behavior 16.7% Relationships 17.4% The slice for Marketing Research is smaller than the slice for Sales Management because there were fewer articles on Marketing Research than for Sales Management. 2.152 a. The data are time series data because the numbers of bankruptcies were collected over a period of 8 quarters. b. Using MINITAB, the time series plot is: Time Series Plot of Bankruptcies 14000 12000 Bankruptcies 10000 8000 6000 4000 2000 0 0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Quarter 2.153 c. There is a generally decreasing trend in the number of bankruptcies as the quarters increase. a. Using MINITAB, the pie chart is: Pie Chart of Drivstar 2 4.1% 5 18.4% 3 17.3% Category 2 3 4 5 4 60.2% b. The average driver’s severity of head injury in head-on collisions is 603.7. Copyright © 2018 Pearson Education, Inc. 76 Chapter 2 c. Since the mean and median are close in value, the data should be fairly symmetric. Thus, we can use the Empirical Rule. We know that about 95% of all observations will fall within 2 standard deviations of the mean. This interval is x  2s  603.7  2(185.4)  603.7  370.8  (232.9, 974.5) Most of the head-injury ratings will fall between 232.9 and 974.5. d. 2.154 x  x 408  603.7   1.06 s 185.4 Since the absolute value is not very big, this is not an unusual value to observe. The z-score would be: z  a. The data collection method was a survey. b. Since the data were 4 different categories, the variable is qualitative. c. Using MINITAB, a pie chart of the data is: Pie Chart of Made in USA <50% 3.8% Category 75-99% 100% 50-74% <50% 75-99% 18.9% 50-74% 17.0% 100% 60.4% About 60% of those surveyed believe that “Made in USA” means 100% US labor and materials. a. Using MINITAB, a Pareto diagram for the data is: Chart Defects 70 60 50 Frequency 2.155 40 30 20 10 0 Body Accessories Electrical Defect Transmission Engine The most frequently observed defect is a body defect. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data b. 77 Using MINITAB, a Pareto diagram for the Body Defect data is: Chart of Body Defects 30 Frequency 25 20 15 10 5 0 2.156 2.157 Paint Dents Upolstery Body Defect Windshield Chrome Most body defects are either paint or dents. These two categories account for  30  25  / 70  55 / 70  .786 of all body defects. Since these two categories account for so much of the body defects, it would seem appropriate to target these two types of body defects for special attention. The percentile ranking of the age of 25 years would be 100%  80% = 20%. Thus, an age of 25 would correspond to the 20th percentile. a. The mean amount exported on the printout is 653. This means that the average amount of money per market from exporting sparkling wine was $653,000. b. The median amount exported on the printout is 231. Since the median is the middle value, this means that half of the 30 sparkling wine export values were above $231,000 and half of the sparkling wine export values were below $231,000. c. The mean 3-year percentage change on the printout is 481. This means that in the last three years, the average change is 481%, which indicates a large increase. d. The median 3-year percentage change on the printout is 156. Since the median is the middle value, this means that half, or 15 of the 30 countries’ 3-year percentage change values were above 156% and half, or 15 of the 30 countries’ 3-year percentage change values were below 156%. e. The range is the difference between the largest observation and the smallest observation. From the printout, the largest observation is $4,852 thousand and the smallest observation is $70 thousand. The range is: R  $4,852  $70  $4,882 thousand f. From the printout, the standard deviation is s = $1,113 thousand. g. The variance is the standard deviation squared. The variance is: s 2  1,1132  1, 238, 769 million dollars squared h. We would expect an export amount to fall within 2 standard deviations of the mean or x  2s  653  2 1,113  653  2, 226   1,573, 2,879 . Since the exports cannot be negative, the interval would be  0, 2,879 . Copyright © 2018 Pearson Education, Inc. 78 Chapter 2 2.158 a. Using MINITAB, the pie charts are: Pie Chart of COLOR, CLARITY COLOR I 40, 13.0% CLARITY D 16, 5.2% E 44, 14.3% Category D E F G H I IF VS1 VS2 VVS1 VVS2 IF 44, 14.3% VVS2 78, 25.3% H 61, 19.8% VS1 81, 26.3% F VVS1 82, 26.6% 52, 16.9% G 65, 21.1% VS2 53, 17.2% The F color occurs the most often with 26.6%. The clarity that occurs the most is VS1 with 26.3%. The D color occurs the least often with 5.2%. The clarity that occurs the least is IF with 14.3%. b. Using MINITAB, the relative frequency histogram is: Histogram of CARAT 60 Frequency 50 40 30 20 10 0 0.30 0.45 0.60 0.75 0.90 1.05 CARAT Using MINITAB, the relative frequency histogram for the GIA group is: Histogram of CARAT CERT = GIA 30 20 Percent c. 10 0 0.30 0.45 0.60 0.75 0.90 1.05 CARAT Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data d. Using MINITAB, the relative frequency histograms for the HRD and IGI groups are: Histogram of CARAT Histogram of CARAT CERT = HRD CERT = IGI 40 40 30 Percent Percent 30 20 10 0 20 10 0.30 0.45 0.60 0.75 0.90 1.05 0 0.30 CARAT e. 0.45 0.60 0.75 0.90 The HRD group does not assess any diamonds less than .5 carats and almost 40% of the diamonds they assess are 1.0 carat or higher. The IGI group does not assess very many diamonds over .5 carats and more than half are .3 carats or less. More than half of the diamonds assessed by the GIA group are more than .5 carats, but the sizes are less than those of the HRD group. x The sample mean is: x  i 1 n i  194.32  .631 308 The average number of carats for the 308 diamonds is .631. g. 1.05 CARAT n f. 79 The median is the average of the middle two observations once they have been ordered. The 154th and 155th observations are .62 and .62. The average of these two observations is .62. Half of the diamonds weigh less than .62 carats and half weigh more. h The mode is 1.0. This observation occurred 32 times. i. Since the mean and median are close in value, either could be a good descriptor of central tendency. j. From Chebyshev’s Theorem, we know that at least ¾ or 75% of all observations will fall within 2 standard deviations of the mean. From part e, x  .631 . 2     xi  194.322 xi2   i  146.19   n 308  .0768 square carats  The variance is: s 2  i n 1 308  1 The standard deviation is: s  s 2  .0768  .277 carats This interval is: x  2 s  .631  2(.277)  .631  .554  (.077, 1.185) Copyright © 2018 Pearson Education, Inc. 80 Chapter 2 k. Using MINITAB, the scatterplot is: Scatterplot of PRICE vs CARAT 18000 16000 14000 PRICE 12000 10000 8000 6000 4000 2000 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 CARAT As the number of carats increases the price of the diamond tends to increase. There appears to be an upward trend. a. Using MINITAB, a bar graph of the data is: Chart of Cause 12 10 8 Count 2.159 6 4 2 0 Collision Fire Grounding Cause HullFail Unknown Fire and grounding are the two most likely causes of puncture. b. Using MINITAB, the descriptive statistics are: Descriptive Statistics: Spillage Variable N Mean StDev Minimum Q1 Median Q3 Maximum Spillage 42 66.19 56.05 25.00 32.00 43.00 77.50 257.00 The mean spillage amount is 66.19 thousand metric tons, while the median is 43.00. Since the median is so much smaller than the mean, it indicates that the data are skewed to the right. The standard deviation is 56.05. Again, since this value is so close to the value of the mean, it indicates that the data are skewed to the right. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data Since the data are skewed to the right, we cannot use the Empirical Rule to describe the data. Chebyshev’s Rule can be used. Using Chebyshev’s Rule, we know that at least 8/9 of the observations will fall within 3 standard deviations of the mean. x  3s  66.19  3(56.05)  66.19  168.15  (101.96, 234.34) or (0, 234.34) since we cannot have negative spillage. Thus, at least 8/9 of all oil spills will be between 0 and 234.34 thousand metric tons. 2.160 Using MINITAB, a pie chart of the data is: Pie Chart of Recoded defect Category False True True 49, 9.8% False 449, 90.2% A response of ‘true’ means the software contained defective code. Thus, only 9.8% of the modules contained defective software code. 2.161 a. Since no information is given about the distribution of the velocities of the Winchester bullets, we can only use Chebyshev's Rule to describe the data. We know that at least 3/4 of the velocities will fall within the interval: x  2 s  936  2(10)  936  20  (916, 956) Also, at least 8/9 of the velocities will fall within the interval: x  3s  936  3(10)  936  30  (906, 966) b. Since a velocity of 1,000 is much larger than the largest value in the second interval in part a, it is very unlikely that the bullet was manufactured by Winchester. Copyright © 2018 Pearson Education, Inc. 81 82 Chapter 2 2.162 a. First, we must compute the total processing times by adding the processing times of the three departments. The total processing times are as follows: Request 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total Processing Time 13.3 5.7 7.6 20.0* 6.1 1.8 13.5 13.0 15.6 10.9 8.7 14.9 3.4 13.6 14.6 14.4 Request 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Total Processing Time 19.4* 4.7 9.4 30.2 14.9 10.7 36.2* 6.5 10.4 3.3 8.0 6.9 17.2* 10.2 16.0 11.5 Request 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 Total Processing Time 23.4* 14.2 14.3 24.0* 6.1 7.4 17.7* 15.4 16.4 9.5 8.1 18.2* 15.3 13.9 19.9* 15.4 14.3* 19.0 The stem-and-leaf displays with the appropriate leaves highlighted are as follows: Stem-and-leaf of Mkt Leaf Unit = 0.10 6 0 7 1 14 2 16 3 22 4 (10) 5 18 6 8 7 4 8 2 9 2 10 1 11 0112446 3 0024699 25 001577 0344556889 0002224799 0038 07 0 0 Stem-and-leaf of Engr Leaf Unit = 0.10 7 14 19 23 (5) 22 19 14 9 9 7 6 5 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 4466699 3333788 12246 1568 24688 233 01239 22379 66 0 3 023 0 4 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data Stem-and-leaf of Accnt Leaf Unit = 0.10 19 (8) 23 21 19 15 15 13 11 11 11 11 10 9 9 8 8 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 HI 111111111112 2333444 55556888 00 79 0023 23 78 8 2 83 Stem-and-leaf of Total Leaf Unit = 1.00 1 3 5 11 17 21 (5) 24 14 10 6 5 4 0 1 0 33 0 45 0 666677 0 888999 1 0000 1 33333 1 4444445555 1 6677 1 8999 2 0 2 3 2 44 HI 30, 36 0 4 99, 105, 135, 144, 182, 220, 300 Of the 50 requests, 10 were lost. For each of the three departments, the processing times for the lost requests are scattered throughout the distributions. The processing times for the departments do not appear to be related to whether the request was lost or not. However, the total processing times for the lost requests appear to be clustered towards the high side of the distribution. It appears that if the total processing time could be kept under 17 days, 76% of the data could be maintained, while reducing the number of lost requests to 1. b. For the Marketing department, if the maximum processing time was set at 6.5 days, 78% of the requests would be processed, while reducing the number of lost requests by 4. For the Engineering department, if the maximum processing time was set at 7.0 days, 72% of the requests would be processed, while reducing the number of lost requests by 5. For the Accounting department, if the maximum processing time was set at 8.5 days, 86% of the requests would be processed, while reducing the number of lost requests by 5. c. Using MINITAB, the summary statistics are: Descriptive Statistics: REQUEST, MARKET, ENGINEER, ACCOUNT Variable MARKET ENGINEER ACCOUNT TOTAL N Mean 50 4.766 50 5.044 50 3.652 50 13.462 StDev 2.584 3.835 6.256 6.820 Minimum 0.100 0.400 0.100 1.800 Q1 2.825 1.775 0.200 8.075 Median Q3 5.400 6.250 4.500 7.225 0.800 3.725 13.750 16.600 Copyright © 2018 Pearson Education, Inc. Maximum 11.000 14.400 30.000 36.200 84 Chapter 2 d. The z-scores corresponding to the maximum time guidelines developed for each department and the total are as follows: Marketing: z  Engineering: z  x  x 7.0  5.04   .51 s 3.84 Accounting: z  x  x 8.5  3.65   .77 s 6.26 Total: z  e. x  x 6.5  4.77   .67 s 2.58 x  x 17  13.46   .52 s 6.82 To find the maximum processing time corresponding to a z-score of 3, we substitute in the values of z, x , and s into the z formula and solve for x. z xx  x  x  zs  x  x  zs s Marketing: x  4.77  3(2.58)  4.77  7.74  12.51 None of the orders exceed this time. Engineering: x  5.04  3(3.84)  5.04  11.52  16.56 None of the orders exceed this time. These both agree with both the Empirical Rule and Chebyshev's Rule. Accounting: x  3.65  3(6.26)  3.65  18.78  22.43 One of the orders exceeds this time or 1/50 = .02. Total: x  13.46  3(6.82)  13.46  20.46  33.92 One of the orders exceeds this time or 1/50 = .02. These both agree with Chebyshev's Rule but not the Empirical Rule. Both of these last two distributions are skewed to the right. f. Marketing: x  4.77  2(2.58)  4.77  5.16  9.93 Two of the orders exceed this time or 2/50 = .04. Engineering: x  5.04  2(3.84)  5.04  7.68  12.72 Two of the orders exceed this time or 2/50 = .04. Accounting: x  3.65  2(6.26)  3.65  12.52  16.17 Three of the orders exceed this time or 3/50 = .06. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 85 x  13.46  2(6.82)  13.46  13.64  27.10 Two of the orders exceed this time or 2/50 = .04. Total: All of these agree with Chebyshev's Rule but not the Empirical Rule. g. No observations exceed the guideline of 3 standard deviations for both Marketing and Engineering. One observation exceeds the guideline of 3 standard deviations for both Accounting (#23, time = 30.0 days) and Total (#23, time = 36.2 days). Therefore, only (1/10)  100% of the "lost" quotes have times exceeding at least one of the 3 standard deviation guidelines. Two observations exceed the guideline of 2 standard deviations for both Marketing (#31, time = 11.0 days and #48, time = 10.0 days) and Engineering (#4, time = 13.0 days and #49, time = 14.4 days). Three observations exceed the guideline of 2 standard deviations for Accounting (#20, time = 22.0 days; #23, time = 30.0 days; and #36, time = 18.2 days). Two observations exceed the guideline of 2 standard deviations for Total (#20, time = 30.2 days and #23, time = 36.2 days). Therefore, (7/10)  100% = 70% of the "lost" quotes have times exceeding at least one the 2 standard deviation guidelines. We would recommend the 2 standard deviation guideline since it covers 70% of the lost quotes, while having very few other quotes exceed the guidelines. a. Using MINITAB, the time series plot is: Time Series Plot of Deaths 900 800 700 600 Deaths 2.163 500 400 300 200 100 0 2003 2004 2005 2006 Year b. The time series plot is misleading because the information for 2006 is incomplete – it is based on only 2 months while all of the rest of the years are based on 12 months. c. In order to construct a plot that accurately reflects the trend in American casualties from the Iraq War, we would want complete data for 2006 and information for the years 2007 through 2011. Copyright © 2018 Pearson Education, Inc. 86 Chapter 2 2.164 a. Using MINITAB, the time series plot of the data is: Time Series Plot of Acquisitions 900 800 700 Acquisitions 600 500 400 300 200 100 1999 2000 1998 1997 1996 1995 1994 1993 1991 1992 1990 1989 1988 1987 1986 1985 1984 1983 1981 1982 1980 0 Year b. To find the percentage of the sampled firms with at least one acquisition, we divide number with acquisitions by the total sampled and then multiply by 100%. For 1980, the percentage of firms with at least on acquisition is (18/1963)*100% = .92%. The rest of the percentages are found in the same manner and are listed in the following table: Year Number of firms 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 TOTAL 1,963 2,044 2,029 2,187 2,248 2,238 2,277 2,344 2,279 2,231 2,197 2,261 2,363 2,582 2,775 2,890 3,070 3,099 2,913 2,799 2,778 51,567 Number with Acquisitions 18 115 211 273 317 182 232 258 296 350 350 370 427 532 626 652 751 799 866 750 748 9,123 Percentage with Acquisitions .92% 5.63% 10.40% 12.48% 14.10% 8.13% 10.19% 11.01% 12.99% 15.69% 15.93% 16.36% 18.07% 20.60% 22.56% 22.56% 24.46% 25.78% 29.73% 26.80% 26.93% Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 87 Using MINITAB, the time series plot is: Time Series Plot of Percent 30 25 Percent 20 15 10 5 1999 2000 1998 1997 1996 1995 1994 1993 1991 1992 1990 1989 1988 1987 1986 1985 1984 1983 1981 1982 1980 0 Year 2.165 c. In this case, the two plots are almost the same. In general, the time series plot of the percents would be more informative. By changing the observations to percents, one can compare time periods with different sample sizes on the same basis. a. Since the mean is greater than the median, the distribution of the radiation levels is skewed to the right. b. x  s  10  3  (7, 13) ; x  2 s  10  2(3)  (4, 16) ; x  3s  10  3(3)  (1, 19) Interval (7, 13) (4, 16) (1, 19) Chebyshev's At least 0 At least 75% At least 88.9% Empirical 68% 95% 100% Since the data are skewed to the right, Chebyshev's Rule is probably more appropriate in this case. c. The background level is 4. Using Chebyshev's Rule, at least 75% or .75(50)  38 homes are above the background level. Using the Empirical Rule,  97.5% or .975(50)  49 homes are above the background level. d. z x  x 20  10   3.333 s 3 It is unlikely that this new measurement came from the same distribution as the other 50. Using either Chebyshev's Rule or the Empirical Rule, it is very unlikely to see any observations more than 3 standard deviations from the mean. Copyright © 2018 Pearson Education, Inc. 88 Chapter 2 2.166 a. Using MINITAB, a pie chart of the data is: Pie Chart of PREVUSE Category NEVER USED USED 28.8% NEVER 71.2% From the chart, 71.2% or .712 of the sampled physicians have never used ethics consultation. b. Using MINITAB, a pie chart of the data is: Pie Chart of FUTUREUSE Category NO YES NO 19.5% YES 80.5% From the chart, 19.5% or .195 of the sampled physicians state that they will not use the services in the future. c. Using MINITAB, the side-by-side pie charts are: Pie Chart of PREVUSE MED SURG Category NEVER USED USED 27.9% USED 29.3% NEVER 70.7% NEVER 72.1% Panel variable: SPEC Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 89 The proportion of medical practitioners who have never used ethics consultation is .707. The proportion of surgical practitioners who have never used ethics consultation is .721. These two proportions are almost the same. d. Using MINITAB, the side-by-side pie charts are: Pie Chart of FUTUREUSE MED SURG Category NO YES NO 17.3% NO 23.3% YES 76.7% YES 82.7% Panel variable: SPEC The proportion of medical practitioners who will not use ethics consultation in the future is .173. The proportion of surgical practitioners who will not use ethics consultation in the future is .233. The proportion of surgical practitioners who will not use ethics consultation in the future is greater than that of the medical practitioners. Using MINITAB, the relative frequency histograms of the years in practice for the two groups of doctors are: Histogram of YRSPRAC 0.0 NO 25 7.5 15.0 22.5 30.0 37.5 YES 20 Percent e. 15 10 5 0 0.0 7.5 15.0 22.5 30.0 37.5 YRSPRAC Panel variable: FUTUREUSE The researchers hypothesized that older, more experienced physicians will be less likely to use ethics consultation in the future. From the histograms, approximately 38% of the doctors that said “no” have more than 20 years of experience. Only about 19% of the doctors that said “yes” had more than 20 years of experience. This supports the researchers’ assertion. Copyright © 2018 Pearson Education, Inc. 90 Chapter 2 f. Using MINITAB, the output is: Descriptive Statistics: YRSPRAC Variable YRSPRAC N 112 N* 6 N for Minimum Median 1.000 14.000 Mean 14.598 Maximum 40.000 Mode 14, 20, 25 Mode 9 The mean is 14.598. The average length of time in practice for this sample is 14.598 years. The median is 14. Half of the physicians have been in practice less than 14 years and half have been in practice longer than 14 years. There are 3 modes: 14, 20, and 25. The most frequent years in practice are 14, 20, and 25 years. g. Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC FUTUREUSE NO YES N 21 91 N* 2 4 Mean 16.43 14.176 Minimum 1.00 1.000 Median 18.00 14.000 Maximum 35.00 40.000 Mode 25 14, 20 N for Mode 5 8 The mean for the physicians who would refuse to use ethics consultation in the future is 16.43. The average time in practice for these physicians is 16.43 years. The median is 18. Half of the physicians who would refuse ethics consultation in the future have been in practice less than 18 years and half have been in practice more than 18 years. The mode is 25. The most frequent years in practice for these physicians is 25 years. h. From the results in part g, the mean for the physicians who would use ethics consultation in the future is 14.176. The average time in practice for these physicians is 14.176 years. The median is 14. Half of the physicians who would use ethics consultation in the future have been in practice less than 14 years and half have been in practice more than 14 years. There are 2 modes: 14 and 20. The most frequent years in practice for these physicians are 14 and 20 years. i. The results in parts g and h confirm the researchers’ theory. The mean, median and mode of years in practice are larger for the physicians who would refuse to use ethics consultation in the future than those who would use ethics consultation in the future. j. Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC N 112 N* 6 Mean 14.598 StDev 9.161 Variance 83.918 Range 39.000 The range is 39. The difference between the largest years in practice and the smallest years in practice is 39 years. The variance is 83.918 square years. The standard deviation is 9.161 years. k. Using MINITAB, the results are: Descriptive Statistics: YRSPRAC Variable YRSPRAC FUTUREUSE NO YES N 21 91 N* 2 4 Mean 16.43 14.176 StDev 10.05 8.950 Variance 100.96 80.102 Range 34.00 39.000 For the physicians who would refuse to use ethics consultation in the future, the standard deviation is 10.05 years. Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 91 l. For the physicians who would use ethics consultation in the future, the standard deviation is 8.95 years. m. The variation in the length of time in practice for the physicians who would refuse to use ethics consultation in the future is greater than that for the physicians who would use ethics consultation in the future. n. Using MINITAB, the scatterplot of the data is: Scatterplot of YRSPRAC vs EDHRS 40 YRSPRAC 30 20 10 0 0 200 400 600 800 1000 EDHRS There does not appear to be much of a relationship between the years of experience and the amount of exposure to ethics in medical school. o. Using MINITAB, a boxplot of the amount of exposure to ethics in medical school is: Boxplot of EDYHS 0 200 400 600 800 1000 EDHRS The one data point that is an extreme outlier is the value of 1000. Copyright © 2018 Pearson Education, Inc. 92 Chapter 2 p. After removing this data point, the scatterplot of the data is: Scatterplot of YRSPRAC2 vs EDHRS2 40 YRSPRAC2 30 20 10 0 0 10 20 30 40 50 60 70 80 90 EDHRS2 With the data point removed, there now appears to be a negative trend to the data. As the amount of exposure to ethics in medical school increases, the years of experience decreases. 2.167 a. Both the height and width of the bars (peanuts) change. Thus, some readers may tend to equate the area of the peanuts with the frequency for each year. b. Using MINITAB, the frequency bar chart is: Chart of Peanut 5 Peanut 4 3 2 1 0 1975 1980 1985 1990 1995 2000 2005 2010 Year 2.168 a. Clinic A claims to have a mean weight loss of 15 during the first month and Clinic B claims to have a median weight loss of 10 pounds in the first month. With no other information, I would choose Clinic B. It is very likely that the distributions of weight losses will be skewed to the right – most people lose in the neighborhood of 10 pounds, but a couple might lose much more. If a few people lost much more than 10 pounds, then the mean will be pulled in that direction. b. For Clinic A, the median is 10 and the standard deviation is 20. For Clinic B, the mean is 10 and the standard deviation is 5. For Clinic A: The mean is 15 and the median is 10. This would indicate that the data are skewed to the right. Thus, we will have to use Chebyshev’s Rule to describe the distribution of weight losses. x  2 s  15  2(20)  15  40  (25, 55) Using Chebyshev’s Rule, we know that at least 75% of all weight losses will be between -25 and 55 Copyright © 2018 Pearson Education, Inc. Methods for Describing Sets of Data 93 pounds. This means that at least 75% of the people will have weight losses of between a loss of 55 pounds to a gain of 25 pounds. This is a very large range. For Clinic B: The mean is 10 and the median is 10. This would indicate that the data are symmetrical. Thus, the Empirical Rule can be used to describe the distribution of weight losses. x  2 s  10  2(5)  10  10  (0, 20) Using the Empirical Rule, we know that approximately 95% of all weight losses will be between 0 and 20 pounds. This is a much smaller range than in Clinic A. I would still recommend Clinic B. Using Clinic A, a person has the potential to lose a large amount of weight, but also has the potential to gain a relatively large amount of weight. In Clinic B, a person would be very confident that he/she would lose weight. c. 2.169 One would want the clients selected for the samples in each clinic to be representative of all clients in that clinic. One would hope that the clinic would not choose those clients for the sample who lost the most weight just to promote their clinic. First we make some preliminary calculations. Of the 20 engineers at the time of the layoffs, 14 are 40 or older. Thus, the probability that a randomly selected engineer will be 40 or older is 14/20 = .70. A very high proportion of the engineers is 40 or over. In order to determine if the company is vulnerable to a disparate impact claim, we will first find the median age of all the engineers. Ordering all the ages, we get: 29, 32, 34, 35, 38, 39, 40, 40, 40, 40, 40, 41, 42, 42, 44, 46, 47, 52, 55, 64 The median of all 20 engineers is 40  40 80   40 2 2 Now, we will compute the median age of those engineers who were not laid off. The ages underlined 40  40 80   40 . above correspond to the engineers who were not laid off. The median of these is 2 2 The median age of all engineers is the same as the median age of those who were not laid off. The median 40  41 81   40.5 , which is not that much different from the median age of those age of those laid off is 2 2 not laid off. In addition, 70% of all the engineers are 40 or older. Thus, it appears that the company would not be vulnerable to a disparate impact claim. 2.170 Answers will vary. The graph is made to look like the amount of money spent on education has risen dramatically from 1980 to 2000, but the 4th grade reading scores have not increased at all. The graph does not take into account that the number of school children has also increased dramatically in the last 20 years. A better portrayal would be to look at the per capita spending rather than total spending. 2.171 There is evidence to support this claim. The graph peaks at the interval above 1.002. The heights of the bars decrease in order as the intervals get further and further from the peak interval. This is true for all bars except the one above 1.000. This bar is greater than the bar to its right. This would indicate that there are more observations in this interval than one would expect, suggesting that some inspectors might be passing rods with diameters that were barely below the lower specification limit. Copyright © 2018 Pearson Education, Inc.