04~solutions_for_chapter_4_exercises下载_在线阅读_7

is_767775

暂无简介

04~solutions_for_chapter_4_exercises Solutions for Chapter 4 Exercises 1 Solutions for Chapter 4 Exercises 4.1 For P1, M2 is 4/3 (2 sec/1.5 sec) times as fast as M1. For P2, M1 is 2 times as fast (10 sec/5 sec) as M2. 4.2 We know the number of instructions and the total time...

Solutions for Chapter 4 Exercises 1 Solutions for Chapter 4 Exercises 4.1 For P1, M2 is 4/3 (2 sec/1.5 sec) times as fast as M1. For P2, M1 is 2 times as fast (10 sec/5 sec) as M2. 4.2 We know the number of instructions and the total time to execute the pro- gram. The execution rate for each machine is simply the ratio of the two values. Thus, the instructions per second for P1 on M1 is (5 × 10 9 instructions/2 seconds) = 2.5 × 10 9 IPS, and the instructions for P1 on M2 is (6 × 10 9 instructions/1.5 sec- onds) = 4 × 10 9 IPS. 4.3 M2 runs 4/3 as fast as M1, but it costs 8/5 as much. As 8/5 is more than 4/3, M1 has the better value. 4.6 Running P1 1600 times on M1 and M2 requires 3200 and 2400 seconds re- spectively. This leaves 400 seconds left for M1 and 1200 seconds left for M2. In that time M1 can run (400 seconds/(5 seconds/iteration)) = 80 iterations of P2. M2 can run (1200 seconds/(10 seconds/iteration)) = 120 iterations. Thus M2 performs better on this workload. Looking at cost-effectiveness, we see it costs ($500/(80 iterations/hour)) = $6.25 per (iteration/hour) for M1, while it costs ($800/(120 iterations/hour)) = $6.67 per (iteration/hour) for M2. Thus M1 is most cost-effective. 4.7 a. Time = (Seconds/cycle) * (Cycles/instruction) * (Number of instructions) Therefore the expected CPU time is (1 second/5 × 10 9 cycles) * (0.8 cycles/instruction) * (7.5 × 10 9 instructions) = 1.2 seconds b. P received 1.2 seconds/3 seconds or 40% of the total CPU time. 4.8 The ideal instruction sequence for P1 is one composed entirely of instructions from class A (which have CPI of 1). So M1's peak performance is (4 × 10 9 cy- cles/second)/(1 cycle/instruction) = 4000 MIPS. Similarly, the ideal sequence for M2 contains only instructions from A, B, and C (which all have a CPI of 2). So M2's peak performance is (6 × 10 9 cycles/second)/ (2 cycles/instruction) = 3000 MIPS. 4.9 The average CPI of P1 is (1 × 2 + 2 + 3 + 4 + 3)/6 = 7/3. The average CPI of P2 is (2 × 2 + 2 + 2 + 4 + 4)/6 = 8/3. P2 then is ((6 × 10 9 cycles/second)/(8/3 cycles/instruction))/((4 × 10 9 cycles/second)/(7/3 cycles/instruction)) = 21/16 times faster than P1. 4.10 Using C1, the average CPI for I1 is (.4 * 2 + .4 * 3 + .2 * 5) = 3, and the average CPI for I2 is (.4 * 1 + .2 * 2 + .4 * 2) = 1.6. Thus, with C1, I1 is ((6 × 10 9 cycles/sec- ond)/(3 cycles/instruction))/((3 × 10 9 cycles/second)/(1.6 cycles/instruction)) = 16/15 times as fast as I2. 2 Solutions for Chapter 4 Exercises Using C2, the average CPI for I2 is (.4 * 2 + .2 * 3 + .4 * 5) = 3.4, and the average CPI for I2 is (.4 * 1 + .4 * 2 + .2 * 2) = 1.6. So with C2, I2 is faster than I1 by factor of ((3 × 10 9 cycles/second)/(1.6 cycles/instruction))/((6 × 10 9 cycles/second)/(3.4 cycles/instruction)) = 17/16. For the rest of the questions, it will be necessary to have the CPIs of I1 and I2 on programs compiled by C3. For I1, C3 produces programs with CPI (.6 * 2 + .15 * 3 + .25 * 5) = 2.9. I2 has CPI (.6 * 1 + .15 * 2 + .25 * 2) = 1.4. The best compiler for each machine is the one which produces programs with the lowest average CPI. Thus, if you purchased either I1 or I2, you would use C3. Then performance of I1 in comparison to I2 using their optimal compiler (C3) is ((6 × 10 9 cycles/second)/(2.9 cycles/instruction))/((3 × 10 9 cycles/second)/(1.4 cycles/instruction)) = 28/29. Thus, I2 has better performance and is the one you should purchase. 4.11 Program P running on machine M takes (10 9 cycles/seconds) * 10 seconds = 10 10 cycles. P ′ takes (10 9 cycles/seconds) * 9 seconds = 9 × 10 9 cycles. This leaves 10 9 less cycles after the optimization. Everytime we replace a mult with two adds, it takes 4 – 2 * 1 = 2 cycles less per replacement. Thus, there must have been 10 9 cycles /(2 cycles/replacement) = 5 × 10 8 replace- ments to make P into P ′ . 4.12 The first option reduces the number of instructions to 80%, but increases the time to 120%. Thus it will take: 0.8 * 1.2 = 0.96 as much time as the initial case. The second option removes 20%/2 = 10% of the instructions and increases the time taken to 110%. Therefore it will take 0.9 * 1.1 = 0.99 times as much time as the initial case. Therefore, the first option is the faster of the two, and it is faster than the orginial, so you should have hardware automatically do garbage collection. 4.13 Let I = number of instructions in program and C = number of cycles in pro- gram. The six subsets are {clock rate, C} {cycle time, C} {MIPS, I} {CPI, C, MIPS} {CPI, I, clock rate} {CPI, I, cycle time}. Note that in every case each subset has to have at least one rate {CPI, clock rate, cycle time, MIPS} and one absolute {C, I}. 4.14 The total execution time for the machines are as follows: Computer A = 2 + 20 + 200 seconds = 222 seconds Computer B = 5 + 20 + 50 seconds = 75 seconds Computer C = 10 + 20 + 15 seconds = 45 seconds Solutions for Chapter 4 Exercises 3 Thus computer C is faster. It is 75/45 = 5/3 times faster than computer B and 222/45 = 74/15 times faster than computer A. 4.15 With the new weighting the total execution time for the group of programs is: Computer A = 8 * 2 + 2 * 20 + 1 * 200 seconds = 256 seconds Computer B = 8 * 5 + 2 * 20 + 1 * 50 seconds = 130 seconds Computer C = 8 * 10 + 2 * 20 + 1 * 15 seconds = 135 seconds So with this workload, computer B is faster by a factor of 135/130 = 1.04 with respect to computer C and a factor of 256/130 = 1.97 with respect to computer A. This new weighting reflects a bias from the previous results by a bias toward pro- gram 1 and program 2, which resulted in computer A and computer B looking comparitively better than before. 4.16 Comparing the times of the program executions on computer A, we see that to get an equal amount of execution time, we will have to run program 1 100 times, program 2 10 times, and Program 3 1 time. This results in the following execution times: Computer A = 100 * 2 + 10 * 20 + 1 * 200 seconds = 600 seconds Computer B = 100 * 5 + 10 * 20 + 1 * 50 seconds = 750 seconds Computer C = 100 * 10 + 10 * 20 + 1 * 15 seconds = 1215 seconds So computer A is fastest with this workload. Using computer B’s program execution times to determine a weighting, we get a ratio of 20:5:2 for program 1, program 2, and program 3, respectively. This results in the following execution times: Computer A = 20 * 2 + 5 * 20 + 2 * 200 seconds = 540 seconds Computer B = 20 * 5 + 5 * 20 + 2 * 50 seconds = 300 seconds Computer C = 20 * 10 + 5 * 20 + 2 * 15 seconds = 330 seconds So in this case, computer B is fastest. Using the execution times on computer C , we get a 6:3:4 ratio, resulting in the fol- lowing total execution times: Computer A = 6 * 2 + 3 * 20 + 4 * 200 seconds = 872 seconds Computer B = 6 * 5 + 3 * 20 + 4 * 50 seconds = 290 seconds Computer C = 6 * 10 + 3 * 20 + 4 * 15 seconds = 180 seconds So in this case computer C is fastest. 4 Solutions for Chapter 4 Exercises As we did the weighting to equalize the times on one particular machine, we ended up running programs that that computer could do the fastest most often and the programs that it was slower on less often. This will tend to be a compara- tive improvement in the execution time for the machine we are normalizing time to (as the weightings are not guaranteed to bias towards the programs that the other machines are better at). In this example, this type of weighting was enough to make that computer appear the fastest. 4.17 We know CPI is equal to (Cycles/second)/(Instructions/second). So the CPI of P1 on M1 is (4 × 10 9 cycles/second)/(2.5 × 10 9 instructions/second) = 1.6 CPI, and the CPI of P1 on M2 is (6 × 10 9 cycles/second)/(4 × 10 9 instructions/second) = 1.5 CPI. 4.18 We have the CPI, the clock rate, and the total execution time, and we're try- ing to find the total number of instructions. Using the following equation: (Cycles/instruction)/(Cycles/second) * Instructions = (Execution time) We find that there are (5 seconds) * (4 × 10 9 cycles/second)/(0.8 cycles/instruc- tion) = 12.5 × 10 9 instructions in P2 on M1, and (10 seconds) * (6 × 10 9 cycles/second)/(1.5 CPI) = 40 × 10 9 instructions in P2 on M2. 4.19 No solution provided. 4.20 No solution provided. 4.21 No solution provided. 4.22 Using Amdahl’s law (or just common sense), we can determine the follow- ing: � Speedup if we improve only multiplication = 100/(30 + 50 + 20/4) = 100/85 = 1.18. � Speedup if we only improve memory access = 100/(30 + 50/2 + 20)) = 100/75 = 1.33. � Speedup if both improvements are made = 100/(30 + 50/2 + 20/4) = 100/60 = 1.67. 4.23 The problem is solved algebraically and results in the equation 100/( Y + (100 – X – Y ) + X /4) = 100/( X + (100 – X – Y ) + Y /2) where X = multiplication percentage and Y = memory percentage. Solving, we get memory percentage = 1.5 × multiplication percentage. Many examples thus exist: for example, multiplication = 20%, memory = 30%, other = 50%, or multiplica- tion = 30%, memory = 45%, other = 25%, and so on. Solutions for Chapter 4 Exercises 5 4.24 Rewrite the execution time equation: Rewrite execution time affected by improvement as execution time before improve- ment × f , where f is the fraction affected. Similarly execution time unaffected. The denominator has two terms: the fraction improved ( f ) divided by the amount of the improvement and the fraction unimproved (1 – f ). 4.25 We can just take the GM of the execution times and use the inverse. , , and , so C is fastest. 4.26 A, B: B has the same performance as A. If we run program 2 once, how many times should we run program 1? x + 1000 = 10 x + 100, or x = 100. So the mix is 99% program 1, 1% program 2. B, C: C is faster by the ratio of . Program 2 is run once, so we have 10 x + 100 = 1.6 × (20 x + 20), x = 3.1 times. So the mix is 76% program 1 and 24% program 2. A, C: C is also faster by 1.6 here. We use the same equation, but with the proper times: x + 1000 = 1.6 × (20 x + 20), x = 31.2. So the mix is 97% program 1 and 3% program 2. Note that the mix is very different in each case! Speedup Execution time before improvement Execution time after improvement ------------------------------------------------------------------------------------------- = Execution time after improvement Execution time affected by improvement Amount of improvement ------------------------------------------------------------------------------------------------------- Execution time unaffected += = Execution time affected Amount of improvement Execution time unaffected × + Amount of improvement ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- = Execution time before improvement f × Amount of improvement --------------------------------------------------------------------------------------------------- Execution time before improvement 1 f – ( )× + = f Amount of improvement --------------------------------------------------------------- 1 f – ( ) +    Execution time before improvement × Speedup Execution time before improvement f Amount of improvement --------------------------------------------------------------- 1 f – ( ) +    Execution time before improvement × ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- = Speedup 1 f Amount of improvement --------------------------------------------------------------- 1 f – ( ) +    -------------------------------------------------------------------------------------------- = GM(A) 1000 32= = GM(B) 1000 32= = GM(C) 400 20= = 32 20 ------ 1.6= 6 Solutions for Chapter 4 Exercises 4.27 No solution provided. 4.28 No solution provided. 4.29 No solution provided. 4.30 4.31 The harmonic mean of a set of rates, where AM is the arithmetic mean of the corresponding execution times. 4.32 No solution provided. 4.33 The time of execution is (Number of instructions) * (CPI) * (Clock period). So the ratio of the times (the performance increase) is: = 1/(Reduction in instruction count) * (2.5 improvement in CPI) Reduction in instruction count = .2475. Thus the instruction count must have been reduced to 24.75% of the original. 4.34 We know that (Number of instructions on V) * (CPI on V) * (Clock period) 5 = (1/1.5) * (CPI of V)/(1.5 CPI) CPI of V = 11.25. 4.45 The average CPI is .15 * 12 cycles/instruction + .85 * 4 cycles/instruction = 5.2 cycles/instructions, of which .15 * 12 = 1.8 cycles/instructions of that is due to multiplication instructions. This means that multiplications take up 1.8/5.2 = 34.6% of the CPU time. Program Computer A Computer B Computer C 1 10 1 0.5 2 0.1 1 5 HM n 1 Ratei ------------ i = 1 n ∑ --------------------- = n Time i i = 1 n ∑ ----------------------- = 1 Time i i = 1 ∑ n ----------------------- ----------------------- = 1 1 n --- Time i i = 1 n ∑ --------------------------- = 1 AM --------- = 10.1 Number of instructions( )* CPI( )* Clock period( ) Number of instructions w/opt.( )* CPI w/opt.( )* Clock period( )------------------------------------------------------------------------------------------------------------------------------------------- = 5 Time on V( ) Time on P( )---------------------------- Number of instructions on V) * (CPI on V) * (Clock period( ) Number of instructions on P) * (CPI on P) * (Clock period( )----------------------------------------------------------------------------------------------------------------------------------------= = Solutions for Chapter 4 Exercises 7 4.46 Reducing the CPI of multiplication instructions results in a new average CPI of .15 * 8 + .85 * 4 = 4.6. The clock rate will reduce by a factor of 5/6 . So the new performance is (5.2/4.6) * (5/6) = 26/27.6 times as good as the original. So the modification is detrimental and should not be made. 4.47 No solution provided. 4.48 Benchmarking suites are only useful as long as they provide a good indicator of performance on a typical workload of a certain type. This can be made untrue if the typical workload changes. Additionally, it is possible that, given enough time, ways to optimize for benchmarks in the hardware or compiler may be found, which would reduce the meaningfulness of the benchmark results. In those cases changing the benchmarks is in order. 4.49 Let T be the number of seconds that the benchmark suite takes to run on Computer A. Then the benchmark takes 10 * T seconds to run on computer B. The new speed of A is (4/5 * T + 1/5 * ( T /50)) = 0.804 T seconds. Then the performance improvement of the optimized benchmark suite on A over the benchmark suite on B is 10 * T/(0.804 T ) = 12.4. 4.50 No solution provided. 4.51 No solution provided. 4.52 No solution provided.

本文档为【04~solutions_for_chapter_4_exercises】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

04~solutions_for_chapter_4_exercises

热门搜索

历史搜索