Notes for CS W3824 04s  4/14/04

FLOATING POINT EXAMPLES

For the following examples, assume we are using a miniature version of
the IEEE 754 Floating Point Standard.  A floating point number will be
represented by an 8-bit word.  The first bit is the sign bit.  The
next 3 bits represent the BE (biased exponent).  Following the usual
procedure, for a k-bit BE, the bias is 2^(k-1)-1; in this case that is
3.  The last 4 bits represent the fraction.  So the decimal number
-6.5, which in binary would be written as -110.1=-1.101x2^2, would be
represented as 11011010.

EXAMPLE 1.  Floating point addition.  Find the floating point number
representation of A+B, where the floating point representations of A
and B are:
A:  1110 0100
B:  0100 1100

First we note that A is negative, B is positive, and that the
magnitude of A exceeds that of B.  This means that we must subtract the
magnitude of B from that of A and that the result will be a negative
number. 

Below we represent each number by writing the significand (1 followed
by a binary point and the fraction) and then the BE.

A: 1.0100 110
B: 1.1100 100

The first step is to align the numbers by shifting the smaller number
to the right by an amount equal to the difference in the BE's.  In
this case we shift B to the right 2 positions, making its BE the same
as that for A.

A: 1.010000 110
B: 0.011100 110

Now we subtract B from A, obtaining:
A-B: 0.110100  110

We then normalize A-B, obtaining:
A-B: 1.1010  101

Now convert A-B to a floating number (remembering that the sign is
negative, and that we must omit the leading 1)

A-B: 1101 1010  The desired result.

EXAMPLE 2.  Floating point multiplication.  Find the product AxB in
floating point form, where the floating point representations of A
and B are:
A:  1101 1000
B:  1011 0110

Since the signs of both numbers are negative, we note that the sign of
the resulting product will be positive.

As in example 1, we represent each number by writing the significand
(1 followed by a binary point and the fraction) and then the BE.

A: 1.1000 101
B: 1.0110 011

To get the initial BE of the result, add the BE's and then subtract
the bias (3), obtaining 1000-11=101

Now multiply the significands.  1.1x1.011=10.0001
Including the BE at the end, the result, so far, is
10.0001 101

Now we normalize to obtain:
1.00001 110

Since there are 5 bits after the binary point, we must round.  Using
the round-to-even method, we see that the rightmost of the 4 bits
after the binary point is a 0, so we do NOT change this bit, which
gives us
1.0000 110

Putting this in floating point form results in the final answer:
01100000