Microblaze interrupts

Microblaze has a classical interrupt system.

Interrupt handlers

When an interrupt is detected, (if interrupts are enabled )mb stops executing the current code and jumps to address 0x00000010.
Briefly, the interrupt handler has to :

1. first save the context (mainly g/p registers)
2. acknowledge the interrupt
3. service the interrupt
4. restore the context and execute rtdi (return from interrupt)

When multiple devices can emit interrupts, as mb has only 1 interrupt input, an interrupt controller is needed.
The service routine will "ask" the controller what device(s) caused the interrupt and act accordingly.

To make things simple, Xilinx provides a low level interr. routine:

save context
read interrupt status word (i.e. what interrupts are active)
for all active interrupts ( in priority order) do
ack the interrupt
call specific_device_handler (their addresses are stored in a table)
restore context

You are encouraged to read the actual code in ./mymicroblaze/libsrc/intc_v1_00_b/src/xintc_l.c

So it is your task to write only the specific_device_handler (one for each device which generates an interrupt)
You are spared by above low_level details, you can write the handler as a normal C function. But ...

Interrupt processing time

Interrupt handlers have to be very fast. This is a matter of life and death.
In a normal system, they are written in hand optimized assembly. No programming effort is spared here.

You will do it in C, to make your life easier; however, you are supposed to write fast code.
The interrupt handlers will be placed in LMB, so it will run at full speed (see Microblaze timing)

The low level handler executes in 3 us.
The user handlers take a variable amount of time. Let's make some estimates:

EXAMPLE
ak4565 audio driver.
This generates an interrupt each ~2.66 ms.
For audio input, it requires 128 samples ( = words = 16 left + 16 right) to be transferred to the audio device each interrupt.
(you can see that we transfer 128 * 1000/ 2.66 = 48000 samples / s)

Suppose the data is read from a big FIFO (let's say 32kb) which is obviously stored in the OPB SRAM.
So, for each sample we'll do a 32bit load for OPB SRAM and a 32 bit store in the audio device.
Total: 128 * ( 1+ 8 + 1 + 3 ) = 128 * 13 = 1664 cycles, which is 33.3us, plus some loop overhead : ~36 us.

If we add audio input, the time will be 2*33.3us + overhead = ~70 us.

Now imagine some naive C code:

for (i=0;i<128;i++)
{
    if (fifo_read_ix == fifo_write_ix) overruns++;
    XIo_Out32 (dac_addr += 4, fifo [ fifo_read_ix ++ ] );
    if (fifo_read_ix >= FIFO_SIZE) fifo_read_ix = 0;
}

This executes in ~ 50 cycles / sample, which is 4 times slower !!!
If similar code is written for audio input, we'll have a total of ~280 us !!!

Is it so bad ? Yes. In fact, even 70 us is not brilliant at all.

Response time

The critical aspect is that devices can not wait forever for an int. to be served.
Most of them handle continuous data streams; if the int. is not served quickly, a buffer over/under flow can occur.

The causes that an interrupt is not immediately served:
- don't forget the context switching delay
- another int. can be currently handled
- the user code executes a critical section with ints. disabled

But how long can the device wait ? For this we have to understand what is a handler supposed to do.

Interrupt driven devices - no FIFO

Let's assume we have a very simple UART. No FIFO. Let's invent a 800 kbps (100 kbytes /s) serial device.
An interrupt occures in 2 situations:
- one byte is received - so now the UART has 1 byte available for read
- the xmit buffer (previous full) becomes empty - so the UART just transmitted 1 byte and is ready for another

The handler will be extremely fast: it will read / write 1 byte (this may take 0.3 us) But remember the interrupt handler
overhead: 3us. In 1s the handler will waste 300 ms even if the real transfer takes only 30 ms !

Err, we can have full speed RX and TX in the same time. In the WORST CASE (we are playing with the devil, remember ?)
the RX / TX ints will arrive "interleaved", so they will need no 300 + 2*30 but 2*(300+30)= 660 ms !

How long can the device wait ? We have to service the interrupt in at most 10us, otherwise a buffer overflow / underflow can occur !
Obviously, it can't run together with our audio device (which takes 70 us to handle its interrupt).

Adding FIFOs

Adding TX / RX FIFOs helps a lot. Assuming 64 byte input + 64 byte output FIFO.

The question is : when should the device generate the interrupt ?

Let's take TX : we can generate an interrupt when the TX FIFO becomes empty.
Then the CPU can write the full FIFO and wait for another interrupt (this approach is used in uartlite).
Advantages: for streaming output, we have 64 times fewer interrupts ! Which means once at 640us, not at 10us !
Disadvantages:
- if we don't serve the interrupt in 10 us, an underflow still occurs ( maybe no big deal for uart, but nasty glitches for an audio device)
- the service time increases from 3.3 us to 5 us (looks acceptable - we write 64 at once in less than twice the time ).

Used solution: generate interrupts when the buffer is half empty.
Two times more interupts, but the service time will be also (slightly) reduced, and we have 320 us to serve the interrupt before underflow happens !!!

Still, in both cases we have a problem: if the output is not streaming, we can end generating a lot of small interrupts, as in the FIFO case.
This can be solved by using a certain "hysterezis".

You can design your own schemes, still everything is a compromise between: total service time, int. service time, required response time.

Now let's look at RX: we can generate an interrupt when the RX FIFO is not empty.
The CPU will then read all the data in the UART and wait for next incoming data (this is also used in uartlite)
Again, we can use half-full ints and make a compromise.
Try to see the advantages / disadvantages.

Good news

Our UART works at 9600 bps, almost 100 times slower than the above sample.
You have enough time to do anything. No compromise is needed.
But for the final project, do not ignore these matters if you target high speed I/O.