Q1(10 pts)
You are tasked with improving the performance of a functional unit. The computation for the functional unit has 4 steps (A-D), and each step is indivisible. Assume there is no dependency between successive computations.
A) What is the greatest possible clock rate speedup possible with pipelining? You do not need to worry about the register timing constraints (e.g. delay, setup, hold). Explain your reasoning.
B) For maximizing the clock rate, what is the minimum number of pipeline registers you would use? Where would you insert the registers (draw or describe) into the datapath provided for this functional unit? Why not use fewer or more pipeline stages?
Q2 (12 pts)
For the following questions, it will be helpful for you to draw the Pipeline progression such as F D X M W (which stand for the different pipeline stages) for each instruction and trace in which clock cycle the computation for an instruction is actually taking place. That way, you will know the correct value of the operands in the computation.
A) (4 pts) Assume that x11 is initialized to 11 and x12 is initialized to 22. Suppose you executed the code below on a version of the pipeline that does not handle data hazards (i.e., the compiler is responsible for addressing data hazards by inserting NOP instructions where necessary).

