____________________________________________________________________________ CSEE E6861y Handout #34b Prof. Steven Nowick April 9, 2016 ____________________________________________________________________________ ____________________________________________________________________________ FINAL CAD MINI-PROJECT -------------------------------------------------------------- DESIGNING A RETIMING TOOL FOR SEQUENTIAL CIRCUIT OPTIMIZATION -------------------------------------------------------------- This project is due on MONDAY, MAY 2, at 4pm. Further details on what to submit will be posted shortly. You will need to hand in a complete working program, with well-commented code, and a short project report. There will also be a required CHECKPOINT on APRIL 26-27. Details will be announced shortly. ____________________________________________________________________________ ____________________________________________________________________________ INTRODUCTION: In class lectures, in basic De Micheli ("DM") reading (sec. 9.3.1, p. 462-mid. p 467) and in basic Leiserson/Saxe ("LS") reading (Handout #33, Abstract and secs. 1-4), we covered the foundations of an important system-level sequential optimization step: RETIMING. In retiming, a user supplies an initial design for a sequential system, including registers and combinational logic (often as a netlist of individual gates or small function units). This optimization does not modify any combinational logic: it simply finds an optimal legal movement of the registers, to improve overall system cost, while still preserving the input/output behavior of the system. Two common simple costs are targeted in the literature: (i) cycle-time minimization, i.e. delay; and (ii) register count reduction, i.e. area. We have already covered a basic method for (i), but have not yet covered (ii). Both of the above algorithms are limited, since each focuses only on a single cost target. In contrast, much industrial design requires consideration of "multi-objective cost functions", where more than one cost is optimized. Retiming has also been generalized to an advanced composite cost function: (iii) area minimization under a given cycle-time constraint "phi". (There is also a dual cost function, delay minimization under a given area constraint A, which we will not consider.) This is a 'two-tiered' cost function, which first optimizes a PRIMARY COST (i.e. guaranteeing a cycle time of "phi"); then, of all such solutions, finds the one which is best under a SECONDARY COST (i.e. minimizing the total number of registers, or area). Effectively, this cost target allows the user to first meet a timing requirement, and then further optimize on area (i.e. number of registers). A particular case is especially interesting for (iii): if one gives it the target of the lowest feasible clock cycle, "phi_opt" (previously computed by method (i)), it will FIND A SOLUTION MEETING THE LOWEST CYCLE TIME WITH *FEWEST POSSIBLE REGISTERS*. ===================== WHAT TO DO: OVERVIEW ===================== In this assignment, you will design your own retiming CAD tool. Your tool will target two of the above cost functions: (i) cycle-time minimization (iii) area minimization under a given cycle-time constraint "phi" For (i), you will implement a faster algorithm than the one covered in class. (It is called "OPT2" in Handout #33, and uses a step called "FEAS".) For (iii), you will formulate this important two-tiered cost problem as a "LINEAR PROGRAM", and solve it using your own linear programming solver. Your tool will first meet a cycle time requirement, then optimize secondarily for area. In particular, given an initial synchronous network for a system: Your tool for (i) will perform *OPTIMAL MINIMUM-CYCLE RETIMING*: it identifies a retiming solution with the *SMALLEST POSSIBLE CLOCK CYCLE TIME*, and which can be obtained from the original solely through movement of registers. Likewise, given any feasible clock cycle (such as the "phi_opt" obtained from (i) above), your tool for (iii) will perform *OPTIMAL AREA MINIMIZATION UNDER THIS CYCLE TIME*: it finds the solution meeting the target cycle time which has *FEWEST REGISTERS* and which can be obtained from the original solely through movement of registers. Both tools will compute exact optimum solutions. NOTE THAT NEITHER TOOL REQUIRES MUCH PROGRAMMING! However, you will need to first carefully learn the intricacies of the retiming method, and associated algorithms. Hence, a substantial portion of your project work will be devoted to learning foundations of the retiming methods and associated algorithms. As part of this project, you will learn and implement two important general algorithms: (a) Floyd-Warshall (also known as Warshall-Floyd) [used in both (i) and (iii)]; and (b) the Simplex Method to solve linear programs [used in (iii)]. Floyd-Warshall is a widely-used graph algorithm, computing all-pairs shortest (or longest paths). It uses dynamic programming. Linear programming (LP) is powerful technique to capture constraint-based problems under user-specified cost functions. The Simplex method is used to solve LP problems, using an iterative-improvement technique. Both Floyd-Warshall and Simplex algorithms are exact, providing optimum solutions. The output of tool (i) will be a minimum-cycle-time retiming vector (identifying all register movements), as well as the resulting optimum clock cycle time, "phi_opt", at which the retimed system can be clocked. The output of tool (iii) will be a retiming vector which guarantees that the optimized design will meet a specified target cycle time "phi"; and it will have the minimum total number of registers of all such solutions. In sum, your two CAD tools will provide a powerful optimization capability, which will enable real-world digital systems (such as filters, multimedia processors, etc.) to obtain higher clock rates and minimize register count. ____________________________________________________________________________ ____________________________________________________________________________ RELEVANT READING: For this problem, you will be closely following the De Micheli and Leiserson/Saxe reading. The reading includes previous assigned reading, plus some new sections and sources. ____________________________________________________________________________ ===================================== a. Retiming Basics (already assigned) ===================================== COST (i) = minimum-cycle time: basic method --------------- 1. DE MICHELI: --------------- retiming basics: ch. 9.3.1, top p. 462 - mid. p. 467 Bellman-Ford algorithm: ch. 2.4.1, top p. 54 - mid. p. 57 (including through end of of Example 2.4.2) Floyd-Warshall algorithm: you find (many resources available, including Wiki pages online Cormen/Leiserson et al. book "Introduction to Algorithms" [see below], etc.) ==> NOTE: Bellman-Ford is used in this basic method only, but *YOU WILL NOT USE IT* for this project <== --------------------------------- 2. Leiserson/Saxe (Handout #33): --------------------------------- retiming basics: Abstract, secs. 1-4 (skip/skim proofs; focus on concepts, theorems and algorithms) NOTE: both (a) and (b) cover the same material, methods, etc., but each has some different insights. ____________________________________________________________________________ ========================================================= b. NEW: Advanced Retiming: improved method for cost (i) ========================================================= COST (i) = MINIMUM-CYCLE TIME: Advanced method, with better runtime and an alternative flow to the basic method (i) above. This reading introduces the "FEAS" algorithm, which is much more efficient than the Bellman-Ford algorithm. In your Part I CAD tool, you will use FEAS, instead of Bellman-Ford. --------------- 1. DE MICHELI: --------------- ch. 9.3.1, mid. p. 467 - bottom p. 469 - Introduces "FEAS" which is used to replace Bellman-Ford in the min-cycle-time Algorithm 9.3.1 --------------------------------- 2. Leiserson/Saxe (Handout #33): --------------------------------- sec. 5: "A More Efficient Algorithm for Clock-Period Minimization" - Introduces "FEAS" which is used to replace Bellman-Ford in the min-cycle-time Algorithm OPT1; the new algorithm is called OPT2 - NOTE: Algorithm "CP" is covered on p. 10-11 of Handout #33, and will be used in FEAS. NOTE: both (a) and (b) cover identical material, methods, etc., but each has some different insights. ____________________________________________________________________________ ======================================= c. NEW: Advanced Retiming: cost (iii) ======================================= COST (i) = MINIMUM-REGISTER COUNT FOR A GIVEN CLOCK CYCLE TIME "phi": Advanced method, for two-tiered cost function (iii). --------------- 1. DE MICHELI: --------------- ch. 9.3.1, "Area Minimization by Retiming." - READ *ONLY*: bottom p. 469 - mid. p. 471 (SKIP from paragraph "Note that the timing feasibility problem" to end of section) ____________________________________________________________________________ ============================================== d. NEW: Linear Programming and Simplex Method ============================================== -------------------------------- 1) ONLINE TUTORIAL (INTERACTIVE) -------------------------------- The following is an excellent hands-on tutorial, on simplex method. It covers both standard maximization problems and general linear programming problems. For this assignment, you will need to master material from both parts. PART 1: solving standard maximization problems http://www.zweigmedia.com/RealWorld/tutorialsf4/framesSimplex.html PART 2: solving general linear programming problems http://www.zweigmedia.com/RealWorld/tutorialsf4/framesSimplex2.html ==> You will be following the tableau format, pivot methodology, termination conditions, etc. of Part 1 for your CAD tool. ==> You will need Part 2 to learn how to convert a minimization problem to a maximization problem. You will also need it to handle any LP systems whose initial basic solution is "not feasible", to convert into a feasible initial tableau. See details below. ------------------------ 2) READING: CORMEN BOOK ------------------------ T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, "Introduction to Algorithms", (3rd EDITION ONLY) ONLINE ACCESS: You can access the electrical version of the book through the Columbia University Lbrary website, by searching the title of the book. Your UNI and PWD are required. ch. 29, LINEAR PROGRAMMING intro.: pp. 843-850 (ALL) ch. 29.1: standard and slack forms (ALL) ch. 29.3: the simplex algorithm read pp. 864 bottom - 869 top (skim rest) ch. 29.5: the initial basic feasible solution read p. 886 to bottom skim bottom p. 886 - bottom p. 888 read/skim bottom p. 888 ("We now demonstrate ...") to top p. 890 NOTE: The Cormen reading should give you insight as to motivation and examples of linear programming. However, it gives a *different* solution method and format for SIMPLEX than the online tutorials above. Use the Cormen book as a background reference only; FOLLOW THE SIMPLEX METHOD OUTLINED IN THE TUTORIALS (#1) ABOVE for your program. ____________________________________________________________________________ ____________________________________________________________________________ DETAILS: This section gives further details on the input specification, the method, and the resulting output. ____________________________________________________________________________ ====== Input: ====== ----------------------------- (a) synchronous network graph ----------------------------- Your tool will take as input a synchronous network graph G, describing a sequential system, as covered in assigned reading. Use the more complete definition from LS, pp. 8-9, instead of the DM definition: G = (V, E, d, w) where: - V is the set of vertices, - E is the set of edges, - d is a vector assigning a delay to each vertex, and - w is a vector assigning a weight to each edge. Assume each delay (d_i) and each weight (w_j) is a non-negative integer. Also, assume that the graph has a special vertex, v_0, which is the 'source' or 'host'. In the correlator example, it is shown as "v_h". This vertex v_0 HAS A DELAY OF 0 (i.e. d_0 = 0). ------------------------------------------------- (b) area minimization under cycle time constraint ------------------------------------------------- Your tool will also take a secondary input for area minimization under cycle time constraint, besides the synchronous network graph, which is the *target cycle time*, "phi". You can assume that this cycle time is *always* feasible, i.e. it is greater than or equal to the optimal cycle time. That is, your tool does not need to handle cases where the input cycle time is too small and no feasible solution exists. ==> More details of the input format will be released shortly. ____________________________________________________________________________ ======= METHOD: ======= You will follow the method as presented in the given reading and online tutorials (listed above). You will create 2 CAD tools (or, if you prefer, 1 tool with 2 modes). Part #1: covers the tool for the optimized method for cost target (i). Part #2: covers the tool for the two-tiered cost target (iii). --------------------------------------------------------------------------- =================================================== PART I: AN OPTIMIZED METHOD FOR MINIMUM CYCLE TIME =================================================== You will implement an optimized algorithm for minimum cycle time retiming, as outlined in the Reading above (part (b)). Both De Micheli and Leiserson/Saxe cover the same material, but with some different terminology. The algorithms are identical. ------------------- De Micheli version: ------------------- The De Micheli reading highlights a new step, called "FEAS" (Algorithm 9.3.2). A detailed example is also shown. The FEAS step is used to *replace Bellman-Ford* in Algorithm 9.3.1. In addition, two other steps which were needed for Bellman-Ford in the original Algorithm 9.3.1 -- construct inequalities (9.4), construct inequalities (9.5) -- are *now omitted* since FEAS does not need them. Note that in De Micheli's FEAS algorithm, the step 'compute the set of data-ready times' may be unclear, since you will have to trace earlier in the book to get this. It is clearer in the Leiserson/Saxe version. ----------------------- Leiserson/Saxe version: ----------------------- FEAS is also covered in Handout #33. The algorithm uses the CP method, which is described earlier in the handout. The new algorithm is called "OPT2". NOTE: in both the DM and LS versions, the resulting retiming vector is always *non-negative*: all entries are 0 or positive integers. ======== Details: ======== In summary, there are 2 main steps: (i) construct the W and D matrices, and (ii) solve the resulting retiming problem using FEAS. ------------------------------------- Step (i): construct W and D matrices ------------------------------------- For this step, you must use the Floyd-Warshall algorithm. You can locate details on the algorithm online, or in books. This highly-efficient algorithm finds "all-pair shortest-paths" in a graph, assuming only positive (or only negative) edge weights. When using Floyd-Warshall, the goal is to create an initial matrix, and *update* it in place (i.e. overwrite entries appropriately). Do not recopy the matrix for each iteration, just update it. For retiming, as an important optimization, both W and D can be computed simultaneously through a *single* run of the Floyd-Warshall algorithm. Details of the approach are presented in the LS reading. INCLUDE THIS OPTIMIZATION IN YOUR TOOL. Examples of W and D matrices for the correlator are presented in LS reading. Finally, based only the W and D matrices, you should COMPUTE THE INITIAL CLOCK CYCLE TIME, "phi_init", of the initial system. -------------------------------------- Step (ii): solve the retiming problem -------------------------------------- For this step, follow the techniques of both LS and DM reading to solve the retiming problem, as summarized above. --------------------------------------------------------------------------- ======================================================= PART II: AREA MINIMIZATION UNDER CYCLE TIME CONSTRAINT ======================================================= This problem is covered in the above Reading, parts (c) and (d). The linear program for this problem is presented in De Micheli on p. 471, near the top. You will follow the online tutorials (and use the Cormen reading as background). ======== Details: ======== In summary, there are 3 main steps: (i) construct the W and D matrices, (ii) formulate as a linear programming (LP) problem, and (iii) solve the LP problem using the simplex method. ------------------------------------- Step (i): construct W and D matrices ------------------------------------- Same method as for Part I. -------------------------------- Step (ii): formulate LP problem -------------------------------- The LP problem is presented in DM at top of p. 471, by the system of equations under the min cost constraint. The LP program includes both an "objective function" (min c^T r) and a set of linear inequalities. The inequalities are also presented in two equivalent theorems: Theorem 9.3.1 in De Micheli, and Theorem 7 in Leiserson/Saxe. As an optimization, both DM and LS cover how SOME CONSTRAINTS CAN BE ELIMINATED, which are dominated by others. See discussion in DM Example 9.3.10, and in LS (both in Table I caption and last paragraph of Section 4). For full credit, you will be implementing this optimization to eliminate unnecessary constraints. -------------------------------------------------- Step (iii): solve LP problem using simplex method -------------------------------------------------- You will create an initial tableau for the above LP problem, then solve it, using the simplex method. Your tool will implement the simplex method, following the method presented in the online tutorials. There are 2 potential complications you will need to handle: (a) your initial LP problem has a 'min', not a 'max', operation (b) the "basic solution" of your initial tableau is *not feasible* (i.e. includes negative values of variables) For (a), the transformation of a 'min' to a 'max' objective function is described in detail in both the 'general linear programming method' online tutorial, and in the Cormen reading. You should apply this transform, before creating your initial tableau. For (b), the 'general linear programming method' online tutorial provides an example of a basic *infeasible* solution. It also includes a complete method to transform the initial tableau with basic infeasible solution to an initial tableau with basic *feasible* solution, through a special type of pivot operation applied iteratively. You should first apply this transform -- but only if your initial tableau has a basic infeasible solution. Once you have obtained a correct initial tableau, in slack form, your program should apply the standard simplex steps (see 'standard' online tutorial) to generate an optimum solution. NOTE: in this method, the resulting retiming vector is always *non-negative*: all entries are 0 or positive integers. Further details and hints will be provided soon. ____________________________________________________________________________ ======= Output: ======= --------------------------------------------- (a) output of min-cycle-time retiming problem --------------------------------------------- The main output of your tool will be: 0. the initial area of the synchronous network graph (i.e. total number of registers) before retiming 1. the optimal retiming vector "r" (with zero and positive elements) 2. the resulting synchronous network graph after retiming with vector "r" 3. the corresponding final clock cycle time "phi_opt", and final area (i.e. total number of registers), after retiming In additional, you will output some intermediate results: 4. W and D matrices 5. initial clock cycle time "phi_init" (computed only using W and D matrices -- see discussion above) 6. when computing W and D, the internal matrix that is updated in Floyd-Warshall algorithm after *each* iteration 7. a sorted array for the elements in matrix D, after you run a sorting algorithm 8. some intermediate results for the FEAS algorithm: Given each run of FEAS, with a desired cycle time "phi", after *each iteration*: (i) the combinational component graph of the network (see DM p. 470) (ii) the data-ready time (\Delta(v)) for each vertex v after running CP on the graph obtained in (i) above (iii) the list of vertices and their data-ready times, which do not satisfy the desired cycle time "phi" -------------------------------------------------------------------- (b) output of min-area re-timing problem under cycle-time constraint -------------------------------------------------------------------- The main output of your tool will be: 0. the initial area of the synchronous network graph (i.e. total number of registers) before retiming 1. the optimal retiming vector "r" (with zero and positive elements), assuming that the given cycle-time is feasible 2. the resulting synchronous network graph after retiming with vector "r" 3. its corresponding final area (in terms of the total # of registers), after retiming In additional, you will output some intermediate results: 4. the c vector, indicating the differences between the in-degree and out-degree of each vertex (see DM book, p. 470) 5. list each basic constraint in the initial linear program (see DM book, mid p. 471), for the clock cycle constraint provided 6. show the reduced list of constraints of #5 above -- after applying the optimization presented in DM and LS (see discussion above) 7. some intermediate results for simplex algorithm: (i) the initial tableau For each iteration (including the iterations in initialization phase): (ii) the iteration number (iii) the pivot selected (iv) the tableau after updating ==> More details of the input format will be released shortly. ____________________________________________________________________________ ____________________________________________________________________________ OTHER ITEMS PROGRAMMING LANGUAGES: Use C, C++, OR JAVA. (Other languages require approval of the instructor and TA.) COMMENTING: Your program should have clear comments. We will be looking at your source code. WHAT TO HAND IN: Details will be posted shortly, discussing how to submit your program, and which files you should submit, etc. SHARED WORK: You may work individually or in a team of two. Everyone in a group gets the same grade. GRADING: Each of the two tools is worth 50% of the total project grade. EXAMPLES: You should try out your program on examples of your own. We are also releasing a couple of examples soon for you to try out. OTHER: Additional comments and clarifications will be broadcast. ____________________________________________________________________________