CSEE4823x Handout #34c Prof. Steven Nowick November 22, 2016 Project #2: FAQ (frequently-asked questions) The following summarizes some questions/answers and clarifications about the project. --------------------------------- LISTED FROM MOST RECENT TO OLDEST --------------------------------- ====================================================================== 12/10/16 ====================================================================== ====================================================================== ------------------------- Handling Incorrect Inputs ------------------------- Q. Should we handle incorrect inputs on "Data_IN", "mode" or "res"? A. NO. Assume only correct inputs are supplied. ------------------------------ Chaining and Design Complexity ------------------------------ Q. How much chaining should I use? Reminder: chaining means to concatenate two combinational operators, together forming a composite function unit, when allowed, i.e. if I indicate that their performance is fast enough that the two in sequence can fit into 1 clock cycle. (See handout #34 and FAQ below.) A. Ideally almost none! Chaining adds complexity, but it can be a useful optimization. As long as you meet the required system timing, you can get full credit. Hence, your goal is to have the design *as simple as possible*, while still meeting the target performance. AIM AS MUCH AS POSSIBLE FOR SIMPLE OPERATIONS AND FLOW IN YOUR GENERALIZED-ASM. Avoid chaining except when absolutely needed. A simpler design is easier to understand and debug, more portable, and likely will have lower power. -------------- Decision Boxes -------------- Q. Do decision boxes ever have operations in them? A. NEVER! See Handout #30(a)/(b). Each decision box only has a single *input variable*. As in the #30(a) example, these can be of two types: (i) control inputs, from the external environment to the FSM (e.g. "Start"), and (ii) status signals, from the datapath to the FSM (e.g. "Data_LSB", "Data != 0"). Q. In Handout #30(a), isn't the decision box with "Data != 0" showing an operation inside the decision box. A. NO. The handout and lecture are quite explicit. "Data != 0" is simply the long name of a single status signal. (See final micro-architecture.) THE OPERATION TO PRODUCE THIS RESULT (i.e. comparison) IS INSERTED IN THE PRECEDING STATE BOX. Q. Which boxes in the generalized-ASM can have operations? A. Only state boxes. ====================================================================== 12/6/16 ====================================================================== ====================================================================== Q. WHAT TO HAND IN: for steps #1 (pseudo-code) and #2 (generalized ASM), should we add comments, notations, clarifications, etc. to make our submitted projects more readable and understandable? A. YES! These are *basic requirements*, as we went over explicitly for Project #1. In particular, much like Handout #30(a)/(b) (but in more detail, because you are handling a more complex subsystem): (i) list all basic variables/storage units (*before* step #1): *BEFORE* PRESENTING YOUR PSEUDO-CODE, YOU SHOULD NEATLY LIST EACH VARIABLE YOU WILL USE. This is separate from listing resources -- they have not been assigned yet! Instead, you are simply listing the abstract *variables* you will be using in your specification, including their size and field structure. . For ex., the variables in handout #30(a)/(b) included Input, Output, Data and Ocount. For each variable, indicate: total # of bits, any breakdown into individual fields, subfield names, etc. DRAW A SIMPLE FIGURE for each storage variable, indicating the above. You can show any breakdown into fields in the figure. These drawings are *before* doing the datapath allocation step, they are abstract figures of the storage units (before you have determined which precise unit in the library the variable maps to). The above is part of general documentation, whether for TA's, management or design review. This documentation makes it easier for others to follow your pseudo-code and generalized ASM. (ii) commenting and annotating your pseudo-code (step #1) You should include comments within your pseudo-code, to document the major steps, what each part does, etc. In addition to inserted text comments, you can also include brackets around groups of instructions, as in handout #30(a)/(b), to highlight the larger parts of your code. (iii) commenting and annotating your generalized ASM (step #2) As in Project #1, DRAW DOTTED LINES AROUND GROUPS OF ASM BLOCKS that form steps, AND LABEL THEM. For example, you can group ASM blocks that handle special symbols, indicate where initialization occurs, where output occurs, what operations occur in which region (e.g. sine vs. cosine vs. natural exponentiation), where an inner loop body occurs and what it does. All such neat *hierarchical* documentation adds to the readability of your solution, helps to present your solution more clearly, and is part of professional practice. ====================================================================== ====================================================================== 12/5/16 ====================================================================== --------------------------- Identifying a leading 1 bit --------------------------- Q. Any hints on how to find the highest priority (i.e. leftmost) 1 bit in a number, in < 1 cycle? A. Consider using a priority encoder, in the combinational library. ====================================================================== 12/1/16 ====================================================================== -------------------------------------------------------------------- Fitting a simple combinational operation + register store in 1 cycle -------------------------------------------------------------------- Q. Can we have a combinational adder produce a sum, and then load it into a simple register, all in 1 clock cycle? A. Of course. This is a basic RTL operation. There are 2 steps in a typical clock cycle: (i) combinational operation, followed by (ii) sequential register storage. Since I indicate that a combinational adder completes in less than 1 cycle, then the adder is the '(i)' above. It goes without saying that the result can be written to a register ('(ii)' above) at the end of that cycle (i.e. the start of next clock cycle). ====================================================================== ------------------------------------------------------------------- "Resource Allocation" step: handling a combinational function unit + sequential storage ------------------------------------------------------------------- Q. Suppose we have an RTL statement which implies a combinational function block (e.g. the adder implied by "x:= y + z"), followed by sequential storage. How do we list the resource allocation for the adder? A. We covered this in class recently. If you have a simple RTL statement, such "x:= y + z", you will clearly be allocating two components together: (i) storage for destination variable "x", and (ii) attached combinational function unit (e.g. adder) to compute the operation. In your resource allocation for variable x, you should *also* allocate the attached combinational adder block (which will be appropriately attached as input to x). Note that Handout #30(a) just had simpler cases: "Ocount := Ocount + 1" gets mapped to a *single* unit which combines storage and operation (i.e. increment capability), namely an up/down counter with parallel load. Likewise "Data := Data >> 1" was mapped to a *single*unit which combines stoarge and operation, namely a shift register with parallel load. However, in the above example (x:= y+z), the operation is more complex (i.e. binary addition), so you will need to allocate two separate components: (i) a sequential storage component (for variable x write), and (ii) a combinational function unit (for + operation). ====================================================================== ---------------------------- "mode" input: data validity ---------------------------- Q. You indicate that "Data_in" and "res" inputs should both be assumed to be valid for 2 clock cycles (long enough for them to be successfully stored). What about the "mode" input? A. Same answer. Assume "mode" is actually valid for *2 clock cycles* and therefore can be correctly stored in a register. ====================================================================== -------------------------------------------------------- ASMS: cascaded decision boxes, multi-way decision boxes -------------------------------------------------------- Q. Can an ASM have a multi-way decision box, or must it always be 2-way? A. It must be only 2-way. It is making a *binary* decision on a single Boolean variable. (See B/V reading on ASM's, as well as Handout #30(a)(b).) Q. How do we form multi-way decisions? If we have cascaded decision boxes, off of a single state, will these require multiple clock cycles? A. To form multi-way decisions, follow what I have gone over in a couple of class lectures: use cascaded binary decision boxes. So, if you want to make a choice based on the values of inputs X and Y, the first box makes a binary choice based on one of the variables (say X), and then its output paths can lead (if needed) to cascaded new decision boxes which make a second choice based on the other variable (say Y). As indicated in class lectures, and in the B/V reading, cascaded decision boxes (or *any* decision boxes) TAKE NO TIME! They are simply neater ways to document a chain of decisions on inputs, from a given state (i.e. state box). So, if you have a cascade of decision boxes emanating from state S1, where one path through the decision boxes leads to S2, there is still *only 1 clock cycle* in going from S1 to S2. Only a state change, from a state box to the next state box, takes a clock cycle. ====================================================================== Q. You indicate we can use up to a 64-bit integer combinational multiplier. What does that mean? A. Same as when we covered multipliers in the earlier homework assignment. A 64-bit integer combinational multiplier means it can take two unsigned 64-bit integer operands, and produce the resulting unsigned 128-bit product. ====================================================================== Q. Can we use 128-bit registers? A. NO. See Handout #34, only 64-bit registers can be used. If you need to store 128 bits, you can use and coordinate multiple smaller registers. ====================================================================== ====================================================================== 11/28/16 ====================================================================== ----------------------------------- barrel shifters vs. barrel rotators ----------------------------------- Q. In Handout #32, I see a figure for a barrel right rotator, are we allowed to use barrel right (or left) shifters? A. See below (11/22 FAQ). Absolutely. Also, read carefully. This section of #32 is called "shifters and rotators". As the text says, Fig. 5.24 "demonstrate[s] one possible design for a barrel shifter in which, for the sake of simplicity, we have limited the shifter to ... right rotation." Use common sense with my guidelines. The components in this section include both barrel shifters and rotators. -------------------- FP 0 and FP infinity -------------------- Q. The handout says we do not need to consider FP infinity, but do need to consider FP 0. Do we need to model "subnormal numbers"? A. The handout says to ignore underflow and overflow, so the answer is no: do not use "subnormal" representation to represent extremely small numbers. However, you still have to handle cases where you have 0, or too small numbers (see below). Q. For FP 0, what do we need to consider? A. (See also 11/22 FAQ below) There are two cases: (i) handling 0, and (ii) handling very small numbers. YOU MUST CORRECTLY HANDLE FP 0, wherever it can arise. Also, WHEN FP NUMBERS OCCUR THAT ARE TOO SMALL TO REPRESENT, since you are not using subnormal numbers, when such FP numbers arise, SIMPLY REPRESENT THEM AS FP +0, and handle them correctly throughout. ====================================================================== 11/22/16 ====================================================================== -------------------------------------------- "Res" input: can it be stored in a register -------------------------------------------- Q. In the project, the "Res" input is a 4-bit field indicating the user-specified resolution of the MacLaurin computation. Its value can range from 5-15. Handout #34 indicates that "data" on the input bus will be valid for 2 clock cycles (long enough to read). Is "Res" considered data, and available this long? A. YES. You can assume that, like the Input FP operand, that "Res" is actually valid for *2 clock cycles* and therefore can be correctly stored in a register. ====================================================================== -------------------------------------- barrel shifters: left vs. right shift -------------------------------------- Q. Can we use left/right barrel shifter in our design? In the Gajski book, only a barrel shifter that implements right-shift is given, but can we slightly modify this implementation to allow using single block to implement both left and right shift? Or should we use separate blocks, one for left shift, another for right? A. You can allocate a barrel shifter that does *either* right shift *or* left shift, but not both. If you need both, you will need two barrel shifters: one for right shift and one for left shift. ====================================================================== Q. multiple sources for the same variable: When doing the datapath for the ASM, if we find a particular variable has multiple loads from multiple sources, should we allocate a MUX with its own control signals to select the source(s)? A. Yes. I will cover in class. For example, suppose you have a generalized-ASM, which includes the following statements: ... x := 0 if (y<0) then x := y ... x:= 15 In step #3 of Handout #30(a) ("Allocate Datapath Blocks", also called "resource allocation"), you would not only: (i) allocate a sequential component for variable x, which (in above example) could both perform a 'load' and a 'hold' operation (such as a basic register with parallel load, looking into Handout #33 for an appropriate component); but also (ii) allocate an input MUX which (in above example) takes 3 inputs and has 1 output (such as a 4-to-1 MUX), to attach to the data input port of storage unit x, to allow you to select one of the 3 possible operands (0, y, 15); this MUX would also have 'select' signals as its control signals. That is, in Datapath Allocation, for the above code fragment, you would allocate both an appropriate storage unit for x, and an attached input MUX. Each typically has attached control signals, which should be clearly indicated. ====================================================================== Q. How different can my pseudo-code (step #1) be from my generalized ASM (step #2)? A. The pseudo-code should be very close in level to the generalized-ASM. See handout #30(a)/(b) as an example. The idea is that the pseudo-code already is broken into simple RTL operations, that map *directly* to the generalized ASM. Of course, the pseudo-code is behavioral, therefore inherently higher-level, and doesn't have a notion of state or clock cycle (unlike the generalized ASM). Nonetheless, as indicated in class, the operations in pseudo-code and generalized ASM should be at a similar level, and the former ones correspond simply and directly to the latter ones. ====================================================================== Q. WHAT TO HAND IN: for steps #1 (pseudo-code) and #2 (generalized ASM), should we add comments, notations, clarifications, etc. to make our submitted projects more readable and understandable? A. YES! These are basic requirements, as we went over explicitly for Project #1. In particular, much like Handout #30(a)/(b) (but in more detail, because you are handling a more complex subsystem): (i) listing basic variables/storage units (before step #1): *Before* presenting your pseudo-code, you should neatly list each variable you will use. For ex., the variables in handout #30(a)/(b) included Input, Output, Data and Ocount. For each variable, indicate total # of bits, any breakdown into individual fields, subfield names, etc. I am requiring you to DRAW A SIMPLE FIGURE for each storage variable, indicating the above. You can show any breakdown into fields in the figure. These drawings are *before* doing the datapath allocation step, they are abstract figures of the storage units (before you have determined which precise unit in the library the variable maps to). The above is part of general documentation, whether for TA's, management or design review. This documentation makes it easier for others to follow your pseudo-code and generalized ASM. (ii) commenting and annotating your pseudo-code (step #1) You must include comments within your pseudo-code to document the major steps, what each part does, etc. In addition to inserted text comments, you can also include brackets around groups of instructions, as in handout #30(a)/(b), to highlight the larger parts of your code. (iii) commenting and annotating your generalized ASM (step #2) As in Project #1, draw dotted lines around groups of ASM blocks that form steps, and annotate them. For example, you can group ASM blocks that handle special symbols, indicate where initialization occurs, where output occurs, what operations occur in which region (e.g. sinh vs. cosh), where an inner loop body occurs and what it does. All such neat *hierarchical* documentation adds to the readability of your solution, helps to present your solution more clearly, and is part of professional practice. ====================================================================== ================================================ RTL operations: pseudo-code and generalized ASM ================================================ Your RTL operations should be simple. The idea is that you are breaking down a relatively complicated specification into a series of *very simple step*s. In the pseudo-code, these steps are 'behavioral' (i.e. no clock cycles indicated). In the generalized ASM, these steps include clock cycles, and hence are "scheduled": each state box in your generalized ASM corresponds to an FSM state, and all operations in the state occur in the same clock cycle. As a result, in the 'resource allocation step', each micro-operation will be mapped to a very simple unit in the component library (i.e. Handouts #32 and #33). DO NOT LOOK FOR COMPLEX OPERATIONS AND UNITS! As I have indicated, unless you get special permission from me, you must only use components in the given library. Also, if you start working on the problem by trying to design the hardware and architecture, you are off track! Follow handout #30(a)/(b): your focus should be entirely on the top-level pseudo-code and generalized ASM first. --------------------- basic RTL operations: --------------------- Typical RTL operations are very simple, such as are included in Handout #30(a)/(b) (1's counter). Each can be performed within 1 clock cycle, and have a direct match to associated hardware units: x := y + z x := y - z x := y >> 3 h := j j := j + 1 z := (x > y) [sequential] or simply (x > y) [combinational] ... etc. Most of your RTL operations will look like the above. ------------------------------------------- allowed multi-step operations ('chaining'): ------------------------------------------- See details in requirements of Handout #34. I indicate that in a few simple cases, you can have 2 combinational operations, one after the other, in the same clock cycle. You should *not* assume a single 'merged' datapath block. Instead this involves 2 separate blocks, and you must identify *both* operations explicitly in your pseudo-code and generalized ASM. For example, I have allowed a combinational shift before (or after) an addition. To specify this in pseudo-code and the generalized ASM, you simply list a composite operation: e.g. x := (a << 1) + b -- shifts a before adding to b y := (a - b) >> 3 -- shifts result after subtracting a - b In this case, since you are doing combinational shifting instead of sequential shifting, you would allocate a barrel shifter in the resource allocation for the shift part of the operation. Both combinational components would be explicitly allocated in the 'resource allocation step', and they would be tied together appropriately for the given composite (i.e. chained) operation. However, very few operations can be chained under my guidelines. See explicit discussion in Handout #34. Normally, an RTL operation is simple a single step! ====================================================================== ================================ SPECIAL FP OPERATIONS: SIGNED 0 ================================ Q. How do I do operations with the special reserved FP symbol, +/-0? A. First, for your operand, if it is 0, assume it can be treated in FP as +0. So you will never be adding two -0 operands. Of course, the integer operand is never infinity. Given these constraints, the relevant operations with signed 0 are: For any N: +0 + N = N -0 + N = N +0 * N = +0 -0 * N = -0 ======================================================================