CSEE4823x						     Handout #34c
Prof. Steven Nowick					November 22, 2016

      	     Project #2:  FAQ (frequently-asked questions)
	     		

The following summarizes some questions/answers and clarifications
about the project.

---------------------------------
LISTED FROM MOST RECENT TO OLDEST
---------------------------------

======================================================================
12/10/16  
======================================================================

======================================================================

-------------------------
Handling Incorrect Inputs
-------------------------

Q.  Should we handle incorrect inputs on "Data_IN", "mode" or "res"?

A.  NO.  Assume only correct inputs are supplied.

------------------------------
Chaining and Design Complexity
------------------------------

Q.  How much chaining should I use?  

    Reminder:  chaining means to concatenate two combinational operators, 
    together forming a composite function unit, when allowed, i.e. if I 
    indicate that their performance is fast enough that the two in sequence 
    can fit into 1 clock cycle.  (See handout #34 and FAQ below.)

A.  Ideally almost none!  Chaining adds complexity, but it can be a useful
    optimization.  As long as you meet the required system timing, you can
    get full credit.  

    Hence, your goal is to have the design *as simple as possible*, while 
    still meeting the target performance.

    AIM AS MUCH AS POSSIBLE FOR SIMPLE OPERATIONS AND FLOW IN YOUR 
    GENERALIZED-ASM.  Avoid chaining except when absolutely needed.

    A simpler design is easier to understand and debug, more portable,
    and likely will have lower power.

--------------
Decision Boxes
--------------

Q.  Do decision boxes ever have operations in them?

A.  NEVER!  See Handout #30(a)/(b).  Each decision box only has a single
    *input variable*.  As in the #30(a) example, these can be of two
    types:  (i) control inputs, from the external environment to the FSM
    (e.g. "Start"), and (ii) status signals, from the datapath to the
    FSM (e.g. "Data_LSB", "Data != 0").


Q.  In Handout #30(a), isn't the decision box with "Data != 0" showing
    an operation inside the decision box.

A.  NO. The handout and lecture are quite explicit.  "Data != 0" is 
    simply the long name of a single status signal.  (See final 
    micro-architecture.)  THE OPERATION TO PRODUCE THIS RESULT 
    (i.e. comparison) IS INSERTED IN THE PRECEDING STATE BOX.


Q.  Which boxes in the generalized-ASM can have operations?

A.  Only state boxes.


======================================================================
12/6/16  
======================================================================

======================================================================

Q.  WHAT TO HAND IN:  for steps #1 (pseudo-code) and #2 (generalized ASM),
    should we add comments, notations, clarifications, etc. to make 
    our submitted projects more readable and understandable?

A.  YES!  These are *basic requirements*, as we went over explicitly
    for Project #1.

    In particular, much like Handout #30(a)/(b) (but in more detail, 
    because you are handling a more complex subsystem):  

    	(i) list all basic variables/storage units (*before* step #1): 
	    
		*BEFORE* PRESENTING YOUR PSEUDO-CODE, YOU SHOULD 
		NEATLY LIST EACH VARIABLE YOU WILL USE.

		This is separate from listing resources -- they have not
		been assigned yet!  Instead, you are simply listing the
		abstract *variables* you will be using in your specification,
		including their size and field structure.
.  
		For ex., the variables in handout #30(a)/(b) included 
		Input, Output, Data and Ocount.

		For each variable, indicate: total # of bits,
		any breakdown into individual fields, subfield names, etc.

		DRAW A SIMPLE FIGURE for each storage variable, 
		indicating the above.  You can show any breakdown
		into fields in the figure.  These drawings are *before*
		doing the datapath allocation step, they are abstract
		figures of the storage units (before you have determined
		which precise unit in the library the variable maps to).

		The above is part of general documentation, whether for
		TA's, management or design review.   This documentation
		makes it easier for others to follow your pseudo-code
		and generalized ASM.

    	(ii) commenting and annotating your pseudo-code (step #1)
	    
		You should include comments within your pseudo-code, 
		to document the major steps, what each part does, etc.  
		In addition to inserted text comments, you 
		can also include brackets around groups of instructions, 
		as in handout #30(a)/(b), to highlight the larger parts of 
		your code.

    	(iii) commenting and annotating your generalized ASM (step #2)
	    
		As in Project #1, DRAW DOTTED LINES AROUND
		GROUPS OF ASM BLOCKS that form steps, AND LABEL THEM.

		For example, you can group ASM blocks
		that handle special symbols, indicate where initialization 
		occurs, where output occurs, what operations occur in 
		which region (e.g. sine vs. cosine vs. natural exponentiation), 
		where an inner loop body occurs and what it does.

		All such neat *hierarchical* documentation adds to the 
		readability of your solution, helps to present your solution 
		more clearly, and is part of professional practice.

======================================================================

======================================================================
12/5/16  
======================================================================

---------------------------
Identifying a leading 1 bit
---------------------------

Q.  Any hints on how to find the highest priority (i.e. leftmost) 1 bit
    in a number, in < 1 cycle? 
    
A.  Consider using a priority encoder, in the combinational library.


======================================================================
12/1/16  
======================================================================

   
--------------------------------------------------------------------
Fitting a simple combinational operation + register store in 1 cycle
--------------------------------------------------------------------

Q.  Can we have a combinational adder produce a sum, and then load it 
    into a simple register, all in 1 clock cycle?
    
A.  Of course.  This is a basic RTL operation.
    There are 2 steps in a typical clock cycle:  (i) combinational
    operation, followed by (ii) sequential register storage.
    
    Since I indicate that a combinational adder completes in less than
    1 cycle, then the adder is the '(i)' above.  It goes without
    saying that the result can be written to a register ('(ii)' above)
    at the end of that cycle (i.e. the start of next clock cycle).   

======================================================================
   
-------------------------------------------------------------------
"Resource Allocation" step:  handling a combinational function unit
	  	      	     	      + sequential storage
-------------------------------------------------------------------

Q.  Suppose we have an RTL statement which implies a combinational
    function block (e.g. the adder implied by "x:= y + z"), followed
    by sequential storage.  How do we list the resource allocation for 
    the adder?
    
A.  We covered this in class recently.
    If you have a simple RTL statement, such "x:= y + z", you will
    clearly be allocating two components together:  (i) storage for 
    destination variable "x", and (ii) attached combinational function
    unit (e.g. adder) to compute the operation.

    In your resource allocation for variable x, you should *also*
    allocate the attached combinational adder block (which will be
    appropriately attached as input to x). 

    Note that Handout #30(a) just had simpler cases:  

    "Ocount := Ocount + 1" gets mapped to a *single* unit which combines
    storage and operation (i.e. increment capability), namely an
    up/down counter with parallel load.  Likewise "Data := Data >> 1"
    was mapped to a *single*unit which combines stoarge and operation,
    namely a shift register with parallel load.

    However, in the above example (x:= y+z), the operation is more complex 
    (i.e. binary addition), so you will need
    to allocate two separate components:  (i) a sequential storage
    component (for variable x write), and (ii) a combinational function
    unit (for + operation).

======================================================================

----------------------------
"mode" input:  data validity
----------------------------

Q. You indicate that "Data_in" and "res" inputs should both be assumed
   to be valid for 2 clock cycles (long enough for them to be successfully
   stored).  What about the "mode" input?

A. Same answer.  Assume "mode" is actually valid for *2 clock cycles* and 
   therefore can be correctly stored in a register.  

======================================================================

--------------------------------------------------------
ASMS:  cascaded decision boxes, multi-way decision boxes
--------------------------------------------------------

Q.  Can an ASM have a multi-way decision box, or must it always
    be 2-way?

A.  It must be only 2-way.  It is making a *binary* decision
    on a single Boolean variable.  (See B/V reading on ASM's,
    as well as Handout #30(a)(b).)


Q.  How do we form multi-way decisions?  If we have cascaded
    decision boxes, off of a single state, will these require
    multiple clock cycles?

A.  To form multi-way decisions, follow what I have gone over
    in a couple of class lectures:  use cascaded binary decision
    boxes.  So, if you want to make a choice based on the values
    of inputs X and Y, the first box makes a binary choice based on
    one of the variables (say X), and then its output paths
    can lead (if needed) to cascaded new decision boxes which
    make a second choice based on the other variable (say Y).

    As indicated in class lectures, and in the B/V reading,
    cascaded decision boxes (or *any* decision boxes) TAKE NO TIME!
    They are simply neater ways to document a chain of decisions
    on inputs, from a given state (i.e. state box).

    So, if you have a cascade of decision boxes emanating from state S1,
    where one path through the decision boxes leads to S2, there is
    still *only 1 clock cycle* in going from S1 to S2.
    Only a state change, from a state box to the next state box,
    takes a clock cycle.

======================================================================

Q. You indicate we can use up to a 64-bit integer combinational multiplier.
   What does that mean?

A. Same as when we covered multipliers in the earlier homework
   assignment.  A 64-bit integer combinational multiplier means it
   can take two unsigned 64-bit integer operands, and produce the 
   resulting unsigned 128-bit product.

======================================================================

Q. Can we use 128-bit registers?

A. NO.  See Handout #34, only 64-bit registers can be used.
   If you need to store 128 bits, you can use and coordinate multiple
   smaller registers.

======================================================================

======================================================================
11/28/16  
======================================================================

-----------------------------------
barrel shifters vs. barrel rotators
-----------------------------------

Q.  In Handout #32, I see a figure for a barrel right rotator, are we
   allowed to use barrel right (or left) shifters?

A.  See below (11/22 FAQ).  Absolutely.  Also, read carefully.
    This section of #32 is called "shifters and rotators".  
    As the text says, Fig. 5.24 "demonstrate[s] one possible design 
    for a barrel shifter in which, for the sake of simplicity, we
    have limited the shifter to ... right rotation."

    Use common sense with my guidelines.  The components in this section
    include both barrel shifters and rotators.

--------------------
FP 0 and FP infinity
--------------------

Q. The handout says we do not need to consider FP infinity, but do
    need to consider FP 0.  Do we need to model "subnormal numbers"?

A. The handout says to ignore underflow and overflow, so the answer is no:
   do not use "subnormal" representation to represent extremely small numbers.  
   However, you still have to handle cases where you have 0, or too small 
   numbers (see below).

Q. For FP 0, what do we need to consider?

A. (See also 11/22 FAQ below)
   There are two cases:  (i) handling 0, and (ii) handling very small numbers.

   YOU MUST CORRECTLY HANDLE FP 0, wherever it can arise.

   Also, WHEN FP NUMBERS OCCUR THAT ARE TOO SMALL TO REPRESENT, since you are
   not using subnormal numbers, when such FP numbers arise, SIMPLY REPRESENT
   THEM AS FP +0, and handle them correctly throughout. 


======================================================================
11/22/16  
======================================================================

--------------------------------------------
"Res" input:  can it be stored in a register
--------------------------------------------

Q. In the project, the "Res" input is a 4-bit field indicating
   the user-specified resolution of the MacLaurin computation.
   Its value can range from 5-15.  Handout #34 indicates that 
   "data" on the input bus will be valid for 2 clock cycles (long
   enough to read).  Is "Res" considered data, and available this
   long?

A. YES.  You can assume that, like the Input FP operand, that
   "Res" is actually valid for *2 clock cycles* and therefore 
   can be correctly stored in a register.  

======================================================================

--------------------------------------
barrel shifters:  left vs. right shift
--------------------------------------

Q. Can we use left/right barrel shifter in our design?

In the Gajski book, only a barrel shifter that implements right-shift
is given, but can we slightly modify this implementation to allow
using single block to implement both left and right shift?
Or should we use separate blocks, one for left shift, another for right?

A.  You can allocate a barrel shifter that does *either* right shift
    *or* left shift, but not both.  If you need both, you will need
    two barrel shifters:  one for right shift and one for left shift.

======================================================================

Q. multiple sources for the same variable:

When doing the datapath for the ASM, if we find a particular variable has
multiple loads from multiple sources, should we allocate a MUX with its
own control signals to select the source(s)?

A.  Yes.  I will cover in class.  For example,  suppose you have a generalized-ASM, 
which includes the following statements:

   ...
   x := 0

   if (y<0) then
      x := y
   ...
   x:= 15

In step #3 of Handout #30(a) ("Allocate Datapath Blocks", also called
"resource allocation"), you would not only:

(i) allocate a sequential component for variable x, 
which (in above example) could both perform a 'load'
and a 'hold' operation (such as a basic register with parallel load,
looking into Handout #33 for an appropriate component); but also

(ii) allocate an input MUX which (in above example) takes 3 inputs
and has 1 output (such as a 4-to-1 MUX), to attach to the data input
port of storage unit x, to allow you to select one of the 3 possible 
operands (0, y, 15);  this MUX would also have 'select' signals as 
its control signals.

That is, in Datapath Allocation, for the above code fragment,
you would allocate both an appropriate  storage unit for x, and
an attached input MUX.  Each typically has attached control signals,
which should be clearly indicated. 

======================================================================

Q.  How different can my pseudo-code (step #1) be from my
    generalized ASM (step #2)?

A.  The pseudo-code should be very close in level to the generalized-ASM.
    See handout #30(a)/(b) as an example.  The idea is that the pseudo-code
    already is broken into simple RTL operations, that map *directly* to
    the generalized ASM.  Of course, the pseudo-code is behavioral,
    therefore inherently higher-level, and doesn't have a notion of state 
    or clock cycle (unlike the generalized ASM).

    Nonetheless, as indicated in class, the operations in pseudo-code
    and generalized ASM should be at a similar level, and the former
    ones correspond simply and directly to the latter ones.

======================================================================

Q.  WHAT TO HAND IN:  for steps #1 (pseudo-code) and #2 (generalized ASM),
    should we add comments, notations, clarifications, etc. to make 
    our submitted projects more readable and understandable?

A.  YES!  These are basic requirements, as we went over explicitly
    for Project #1.

    In particular, much like Handout #30(a)/(b) (but in more detail, 
    because you are handling a more complex subsystem):  

    	(i) listing basic variables/storage units (before step #1): 
	    
		*Before* presenting your pseudo-code, you should 
		neatly list each variable you will use.  
		For ex., the variables in handout #30(a)/(b) included 
		Input, Output, Data and Ocount.

		For each variable, indicate total # of bits,
		any breakdown into individual fields, subfield names, etc.

		I am requiring you to DRAW A SIMPLE FIGURE for each storage
		variable, indicating the above.  You can show any breakdown
		into fields in the figure.  These drawings are *before*
		doing the datapath allocation step, they are abstract
		figures of the storage units (before you have determined
		which precise unit in the library the variable maps to).

		The above is part of general documentation, whether for
		TA's, management or design review.   This documentation
		makes it easier for others to follow your pseudo-code
		and generalized ASM.

    	(ii) commenting and annotating your pseudo-code (step #1)
	    
		You must include comments within your 
		pseudo-code to document the major steps, what each part 
		does, etc.  In addition to inserted text comments, you 
		can also include brackets around groups of instructions, 
		as in handout #30(a)/(b), to highlight the larger parts of 
		your code.

    	(iii) commenting and annotating your generalized ASM (step #2)
	    
		As in Project #1, draw dotted lines around
		groups of ASM blocks that form steps, and annotate them.

		For example, you can group ASM blocks
		that handle special symbols, indicate where initialization 
		occurs, where output occurs, what operations occur in 
		which region (e.g. sinh vs. cosh), where an inner loop body 
		occurs and what it does.

		All such neat *hierarchical* documentation adds to the 
		readability of your solution, helps to present your solution 
		more clearly, and is part of professional practice.

======================================================================
================================================
RTL operations:  pseudo-code and generalized ASM
================================================

Your RTL operations should be simple.  The idea is that you are breaking
down a relatively complicated specification into a series of *very simple
step*s.  In the pseudo-code, these steps are 'behavioral' (i.e. no clock
cycles indicated).  In the generalized ASM, these steps include clock 
cycles, and hence are "scheduled":  each state box in your generalized ASM 
corresponds to an FSM state, and all operations in the state occur in the 
same clock cycle.

As a result, in the 'resource allocation step', each micro-operation 
will be mapped to a very simple unit in the component library
(i.e. Handouts #32 and #33).

DO NOT LOOK FOR COMPLEX OPERATIONS AND UNITS!  As I have indicated, unless
you get special permission from me, you must only use components in the
given library.

Also, if you start working on the problem by trying to design the 
hardware and architecture, you are off track!  Follow handout #30(a)/(b):  
your focus should be entirely on the top-level pseudo-code and 
generalized ASM first.

---------------------
basic RTL operations:
---------------------

Typical RTL operations are very simple, such as are included
in Handout #30(a)/(b) (1's counter).  Each can be performed within 
1 clock cycle, and have a direct match to associated hardware units:

   x :=  y + z
   x :=  y - z 
   x := y >> 3
   h := j
   j := j + 1
   z := (x > y) [sequential]
   or simply (x > y) [combinational]
   ... etc.

Most of your RTL operations will look like the above.  

-------------------------------------------
allowed multi-step operations ('chaining'): 
-------------------------------------------

See details in requirements of Handout #34.
I indicate that in a few simple cases, you can have 2 combinational 
operations, one after the other, in the same clock cycle.  You should 
*not* assume a single 'merged' datapath block.
Instead this involves 2 separate blocks, and you must identify *both* 
operations explicitly in your pseudo-code and generalized ASM.

For example, I have allowed a combinational shift before (or after) an 
addition.  To specify this in pseudo-code and the generalized ASM, you
simply list a composite operation:

       e.g. x :=  (a << 1) + b  -- shifts a before adding to b
            y := (a - b) >> 3   -- shifts result after subtracting a - b 

In this case, since you are doing combinational shifting instead of sequential 
shifting, you would allocate a barrel shifter in the resource allocation for
the shift part of the operation.  Both combinational components would be
explicitly allocated in the 'resource allocation step', and they would be
tied together appropriately for the given composite (i.e. chained) operation.

However, very few operations can be chained under my guidelines.  See explicit
discussion in Handout #34. 

Normally, an RTL operation is simple a single step!

====================================================================== 
================================
SPECIAL FP OPERATIONS:  SIGNED 0
================================

Q. How do I do operations with the special reserved FP symbol, +/-0?

A.  First, for your operand, if it is 0, assume it can be treated in
    FP as +0.  So you will never be adding two -0 operands.  Of course, the
    integer operand is never infinity.

    Given these constraints, the relevant operations with signed 0 are:

      For any N:

    	  +0 + N = N  
    	  -0 + N = N  
    	  +0 * N = +0  
    	  -0 * N = -0  

======================================================================