____________________________________________________________________________ CSEE E6861y Handout #34e Prof. Steven Nowick April 9, 2016 ____________________________________________________________________________ ____________________________________________________________________________ FINAL CAD MINI-PROJECT ----------------------------------------------------- RETIMING TOOL: INPUT AND OUTPUT FORMAT REQUIREMENTS ----------------------------------------------------- This handout gives details on the input and output requirements and assumptions for the final CAD mini-project. While the handout is long, it basically gives you simple and precise instructions on input file format, and how to break the output results neatly into a set of separate files. ____________________________________________________________________________ ____________________________________________________________________________ Handout #34b lists a top-level view for input and output of your tool. This information is repeated below, but with more details on format, requirements and assumptions. ---------------------------------------------------------------------------- =============================================== Specification Model: Synchronous Network Graph =============================================== Use the more complete definition from LS, p. 8 (rather than DM definition): G = (V, E, d, w) where: - V is the set of vertices, - E is the set of edges, - d is a vector assigning a delay to each vertex, and - w is a vector assigning a weight to each edge. Assume each delay (d_i) and each weight (w_j) is a non-negative integer. Also, assume that the graph has a special vertex, v_0, which is the 'source' or 'host'. In the correlator example, it is shown as "v_h". Assume this vertex v_0 ALWAYS HAS A DELAY OF 0 (i.e. d_0 = 0), and NO RETIMING WILL BE DONE ON THIS VERTEX (i.e. r_0 = 0). ===================================================================== ===================================================================== ----------------------------------------------- INPUT FILE FORMAT: DETAILS/ASSUMPTIONS/EXAMPLE ----------------------------------------------- ========= overview: ========= Since you will be implementing two distinct tools (or two modes of a single tool), the input formatting requirements for each are described separately. The two tools have different optimization targets: Part I. minimum cycle time Part II. minimum area under a given cycle-time constraint For Part I, your tool should take one input argument -- the file name, where the initial synchronous network graph G is located. For Part II, your tool should read in two input arguments -- the synchronous network graph (as above), and a cycle time constraint. Your program should read these arguments from the *command line*. ================ input file name: ================ The synchronous network G will be given in the text file format below. As an example, the format is illustrated on the initial (un-retimed) correlator from LS fig. 3 and DM fig. 9.8. EXAMPLE: for the 'correlator', assume the input file name is: correlator-in.txt ======= header: ======= The file header will include 3 lines, each starting with a keyword, listing in order: (i) name of the example, (ii) total the number of vertices (not counting the 'source' or 'host' vertex), and (iii) the initial vector of delays on each vertex (non-negative integers only), as an ordered list (in same order as vertices) ------------------------------ EXAMPLE: unretimed correlator ------------------------------ .name correlator .n 7 # total number of vertices, but excluding v_0 .d 3 3 3 3 7 7 7 # vertex weights for v_1 to v_7, listed in order NOTE: the correlator has vertices v0-v7 (alternatively, v_h host, followed by v_a to v_g). Hence it has 8 vertices. Since the host vertex is always v_0, we list the # of other vertices as *7*. Also, the file can include a comment "#...." at the end of any line. ================ body and footer: ================ The graph body describes the synchronous graph structure. It begins with a keyword '.g'. In particular, each edge in the graph is listed on a separate line, along with its edge weight, in the form: i j w_ij for an edge from i to j with edge weight w_ij. The body terminates with a footer: .e ------------------------------ EXAMPLE: unretimed correlator ------------------------------ .g #start of graph body 0 1 1 1 2 1 2 3 1 3 4 1 4 5 0 5 6 0 6 7 0 7 0 0 1 7 0 2 6 0 3 5 0 .e #footer NOTE: the input format for a graph G should allow *ANY ORDER* of entry for the rows. Hence, in the above example, if the same rows are reordered, your tool should still be able to read it in, and produce the same graph structure. Also, assume the input file will have: (i) no space at the beginning and end of each row, throughout the file (ii) no space line at the beginning and end of the file (iii) no space line between any of the two rows ============================================ (b) cycle-time constraint (for Part II only) ============================================ For Part II only, as indicated above, the target cycle time "phi" is given as the 2nd command line argument. It must be a positive integer. By assumption, your tool can assume that only feasible cycle times will be provided. That is, you do *not* need to handle a cycle time that is too small, i.e. over-constrained. ===================================================================== ===================================================================== ---------------------------------------- OUTPUT FILE FORMAT: DETAILS/ASSUMPTIONS ---------------------------------------- ========= overview: ========= Since you will be implementing two distinct tools (or two modes of a single tool), the output formatting requirements for each are described separately. The two tools have different optimization targets: Part I. minimum cycle time Part II. minimum area under a given cycle-time constraint --------------------------------- running the tool: 2 output modes --------------------------------- Your tool must run in TWO MODES: "normal" and "verbose". In NORMAL MODE: the tool prints basic results. In VERBOSE MODE: the tool in addition prints detailed intermediate results. ---------------------------------------------------------------------------- ======================================== PART I: MIN-CYCLE-TIME RETIMING PROBLEM ======================================== =================== (a) output overview =================== The main output of your tool will be: 0. the initial area of the synchronous network graph (i.e. total number of registers) before retiming 1. the optimal retiming vector "r" (with zero and positive elements) 2. the resulting synchronous network graph after retiming with vector "r" 3. the corresponding final clock cycle time "phi_opt", and final area (i.e. total number of registers), after retiming In additional, you will output some intermediate results: 4. W and D matrices 5. initial clock cycle time "phi_init" (computed only using W and D matrices -- see discussion above) 6. when computing W and D, the internal matrix that is updated in Floyd-Warshall algorithm after *each* iteration 7. a sorted array for the elements in matrix D, after you run a sorting algorithm 8. some intermediate results for the FEAS algorithm: Given each run of FEAS, with a desired cycle time "phi", after *each iteration*: (i) the combinational component graph of the network (see DM p. 470) (ii) the data-ready time (\Delta(v)) for each vertex v after running CP on the graph obtained in (i) above (iii) the list of vertices and their data-ready times, which do not satisfy the desired cycle time "phi" ========================= (b) output files: summary ========================= ----------- Items #0-3: ----------- The main output of items #0-3 above are written to 2 files: xxxxx-part1-summary.txt ==> items #0, 1, and 3 xxxxx-part1-CDFG-output.txt ==> item #2 where it is assumed that the input file, xxxxx-in.txt, provided the initial specification. For example, for input file "correlator-in.txt", the resulting main output files are: "correlator-part1-summary.txt", and "correlator-part1-CDFG-output.txt". Item #0, 1, 2, 3: are produced in BOTH NORMAL AND VERBOSE MODES. ----------- Items #4-8: ----------- The additional output results of items #4-8 above are written to 3 files: xxxxx-part1-WD.txt ==> items #4, 5 and 7 xxxxx-part1-Floyd-Warshall.txt ==> item #6 xxxxx-part1-FEAS.txt ==> item #8 where it is assumed that the input file, xxxxx-in.txt, provided the initial specification. Items #4, 5 and 7: are produced in BOTH NORMAL AND VERBOSE MODES. Items #6 and 8: are produced *only* in VERBOSE MODE. ========================================= (c) output format: detailed requirements ========================================= ------------------------------------------------- #0. initial area of the synchronous network graph ------------------------------------------------- Write the initial area result to the file . NOTE: the area is defined as the total # of registers. ----------------------------------------------------------------- #1. optimal retiming vector "r" (with zero and positive elements) ----------------------------------------------------------------- Write the resulting optimal re-timing vector to the file , the same file as #0 above. Separate the vector into multiple rows in sequence. EACH ROW SHOULD CONTAINS 10 ELEMENTS. NOTE: the re-timing vector starts with v_0 (v_h). The (i+1)th element corresponds to vertex v_i. ------------------------------------------- #2. the resulting synchronous network graph ------------------------------------------- Write the resulting optimal synchronous network graph to the file . Write the re-timed graph, using the same file format used for the initial unretimed graph (see input section). However, you have to *sort* the rows in the body in the increasing manner, first by v_i, then by v_j. ------------------------------------------------ #3. the final optimal clock cycle time "phi_opt" ------------------------------------------------ Write the resulting optimal cycle time to the file , the same file as #0 above. -------------------- #4. W and D matrices -------------------- Write the W and D matrices to the file . Print out W and D matrices in a neat and readable form, with rows and columns labeled with vertex #'s. An example is shown in LS p. 14. --------------------------------------- #5. initial clock cycle time "phi_init" --------------------------------------- Write the initial cycle time "phi_init" to the file , the same file as #4 above. ----------------------------------------------------------- #6. the internal matrix updated in Floyd-Warshall algorithm ----------------------------------------------------------- NOTE: PRINT OUT IN *VERBOSE MODE ONLY*. Write the intermediate results for Floyd-Warshall algorithm to the file . Floyd-Warshall algorithm uses dynamic programming and computes pair-wise shortest paths in a graph. In our algorithm, it creates two initial matrices, for W and D respectively, and updates them simultaneously through a *single* run of the algorithm. The Floyd-Warshall algorithm has |V| iterations, where |V| is the # of nodes in the graph. After iteration k, it finds the shortest path using only vertices labeled from 1 to k. Print out the initial matrices W0 and D0 and the matrices updated after each iteration, i.e. Wi and Di (where i = 1 to |V|). You can pick your own format, but it has to be neat and readable. ------------------------------------------------ #7. a sorted array for the elements in matrix D ------------------------------------------------ Write the sorted array for elements in matrix D to the file , the same file as in #4 above. Separate the vector into multiple rows in sequence. EACH ROW SHOULD CONTAINS 10 ELEMENTS. NOTE: You should sort the elements in an increasing manner. Also, eliminate any duplicate elements. -------------------------------------------- #8. intermediate results for FEAS algorithm -------------------------------------------- NOTE: PRINT OUT IN *VERBOSE MODE ONLY*. Write the intermediate results for FEAS to the file . For *each run* of FEAS with the selected cycle time, after *each iteration*, print out: (a) the combinational component graph of the network, starting with the initial one (b) data-ready time (Delta (v)) for each node after running CP on the combinational component graph obtained in (a) (c) the list of vertices that their data-ready time do not satisfy the current desired cycle time format for (a) -------------- Examples of combinational component graph are on DM p. 470, figures a2, b2 and c2. Combinational component graph partitions the corresponding synchronous network graph into combinational components. The weights of edges in the graph are always 0. Save the combinational component graphs using the same format as the resulting synchronous network graph. NOTE: You still need to sort the rows in the body. format for (b) -------------- Separate the output into multiple rows in sequence. EACH ROW SHOULD CONTAIN 10 ELEMENTS. The (i+1)th element corresponds to the data-ready time for vertex i. format for (c) -------------- Separate the output into multiple rows in sequence. EACH ROW SHOULD CONTAIN 10 VERTICES. ------------------------ Please separate the results of *each run* of FEAS and *each iteration* of the same run neatly using separation lines. ---------------------------------------------------------------------------- =============================================================== PART II: MIN-AREA RETIMING PROBLEM UNDER CYCLE-TIME CONSTRAINT =============================================================== =================== (a) output overview =================== The main output of your tool will be: 0. the initial area of the synchronous network graph (i.e. total number of registers) before retiming 1. the optimal retiming vector "r" (with zero and positive elements), assuming that the given cycle-time is feasible 2. the resulting synchronous network graph after retiming with vector "r" 3. its corresponding final area (in terms of the total # of registers), after retiming In additional, you will output some intermediate results: 4. the c vector, indicating the differences between the in-degree and out-degree of each vertex (see DM book, p. 470) 5. list each basic constraint in the initial linear program (see DM book, mid p. 471), for the clock cycle constraint provided 6. show the reduced list of constraints of #5 above -- after applying the optimization presented in DM and LS (see discussion above) 7. some intermediate results for simplex algorithm: (i) the initial tableau For each iteration (including the iterations in initialization phase): (ii) the iteration number (iii) the pivot selected (iv) the tableau after updating ========================= (b) output files: summary ========================= ----------- Items #0-3: ----------- The main output of items #0-3 above are written to 2 files: xxxxx-part2-summary.txt ==> items #0, 1, and 3 xxxxx-part2-CDFG-output.txt ==> item #2 where it is assumed that the input file, xxxxx-in.txt, provided the initial specification. Item #0, 1, 2, 3: are produced in BOTH NORMAL AND VERBOSE MODES. ----------- Items #4-7: ----------- The additional output results of items #4-7 above are written to 4 files: xxxxx-part2-c-vector.txt ==> item #4 xxxxx-part2-complete-constraint.txt ==> item #5 xxxxx-part2-reduced-constraint.txt ==> item #6 xxxxx-part2-simplex.txt ==> item #7 Items #4, 5 and 6: are produced in BOTH NORMAL AND VERBOSE MODES. Item #7: is produced *only* in VERBOSE MODE. ========================================= (c) output format: detailed requirements ========================================= ------------------------------------------------- #0. initial area of the synchronous network graph ------------------------------------------------- Write the initial area result to the file . ----------------------------------------------------------------- #1. optimal retiming vector "r" (with zero and positive elements) ----------------------------------------------------------------- Write the resulting optimal re-timing vector to the file , the same file as #0 above. Separate the vector into multiple rows. Each row contains 10 elements. NOTE: the re-timing vector starts with v_0 (v_h). The (i+1)th element corresponds to vertex v_i. ------------------------------------------- #2. the resulting synchronous network graph ------------------------------------------- Write the resulting optimal synchronous network graph to the file . Write the re-timed graph, using the same file format used for the initial unretimed graph (see input section). However, you have to *sort* the rows in the body in the increasing manner, first by v_i, then by v_j. -------------------------------- #3. the corresponding final area -------------------------------- Write the resulting optimal area to the file , the same file as #0 above. ---------------- #4. the c vector ---------------- Write the c vector to the file . Separate the vector into multiple rows in sequence. EACH ROW SHOULD CONTAIN 10 ELEMENTS. The c vector indicates the difference between the in-degree and out-degree of each vertex (see DM book, p. 470). ----------------------------------------------------- #5. list of a complete set of constraint inequalities ----------------------------------------------------- Write the complete set of constraint inequalities to file . The complete set of basic constraints are on DM. mid p. 471, which are identical to constraints (9.4) and (9.5). Each row contains one inequality. You should sort rows using v_i first, in an increasing manner, then v_j, then the # on the right side (smaller # first). An example is shown below: v1 - v7 <= -1 v1 - v12 <= 3 v1 - v12 <= 5 v4 - v3 <= 19 ... etc. Use the same format as in the example, including spaces. ---------------------------------------------------- #6. list of a reduced set of constraint inequalities ---------------------------------------------------- Write the reduced set of final constraint inequalities to file . List the reduced list of constraints of #5 above, after applying the optimization presented in DM and LS. Use the same format as the in #5 above. Also, be sure to sort the inequalities. ---------------------------------------------- #7. intermediate results for simplex algorithm ---------------------------------------------- NOTE: PRINT OUT IN *VERBOSE MODE ONLY*. Write the intermediate results for simplex algorithm to the file . You need to print out the following: (a) the initial tableau The initial tableau can have a feasible or infeasible initial basic solution. Construct the tableau as follows: The 1st row corresponds to the 1st constraint inequality as in #6; the 2nd row corresponds to the 2nd inequality, and so on. DO NOT SWITCH THE ORDER OF ROWS IN THE TABLEAU. Print out the tableau using a neat and readable format. Examples can be found in zweigmedia website. Include *every* row and column in the example. Label starred rows properly if it has an infeasible basic solution. Also, KEEP ALL PARTING LINES. For each iteration after you construct the initial tableau (including the iterations in initialization phase): (b) the iteration # Do *not* separate the iterations in the initialization phase. Take all the iterations as a whole, starting from iteration #1. (c) the pivot selected First, highlight the pivot column by adding parentheses to the label of the pivot column. When there is a tie, alway select the *LEFT* column. Then, compute the 'test ratio' for each row, if applicable, and write them down on the right of each row (see examples in zweigmedia). Finally, highlight the pivot, for example, by adding parentheses to that entry. When there is a tie, always select the *UPPER* entry in the pivot column. These information in step (c) can be highlighted in the tableau of the previous iteration. You don not need to copy the tableau from the previous iteration. (d) the tableau after updating Print out the updated tableau by clearing the pivot column. Then use it to highlight the position of pivot in the next iteration, following the steps in (c). Please separate the results of *each iteration*neatly using separation lines. ____________________________________________________________________________ ____________________________________________________________________________