C.A.P.S.L.

Summer 2002 Project - Network Processor IXP1200

--------

                Next Generation Processor -- Network Processor

The is significant interest in network processors that can handle packets arriving at multi-gigabit line speeds. Designed primarily to implement forwarding engines on IP routers, these network processors offer one clear advantage over ASIC-based solutions: they are programmable.

IXP1200 is the one of commonly used network processors provided by Intel. The IXP1200 chip itself contains a general-purpose StrongARM processor core and six special-purpose micro-Engines running at 177 MHz. Each of the six micro-Engines supports four hardware contexts, for a total of 24 contexts. There is 4KB instruction store associated with each micro-Engine. The StrongARM is responsible for loading these micro-Engine instruction stores; actual StrongARM instructions are fetched from DRAM. A 4KB on-chip scratch memory is used for synchronization and control of the micro-Engine.

The chip also has a pair of FIFOs used to send/receive packets to/from the network ports on IX bus. Each "FIFO" is actually an addressable 16slot * 64byte register file. It is up to the programmer to use these register files so that they behave as FIFOs.

The most natural use of the DRAM is to buffer packets. The DRAM is connected to the processor by a 64-bit * 88MHz data path, implying a potential to move packets into and out of DRAM at 5.6 Gbps. Similarly, SRAM is a natural place to store the routing table, along with any necessary per-flow state. The SRAM data path has a peak transfer rate of 32-bit * 88MHz = 2.8Gbps.

Experiments have showed the performace bottleneck lies not in the processor speed but the memory bandwidth.

Network processors commonly employ parallelism to hide memory access latency. Intel IXP1200 includes six micro-Engines, each supporting four hardware contexts. The IXP1200 automatically switches to a new context when the current context stalls on a memory operation.

Programming such a network processor involves two challenges. First, assuming one can identify a single function that the network processor has to support, the parallel hardware contexts must be programmed in a way that fully utilizes the available memory bandwidth, and hence sustain the maximum required packet rate. The second challenge is to allow the network processor to be programmed more dynamically. Instead of running a single fixed function, it should be possible to load a new function -- or perhaps an extension -- into the network processor, ideally at runtime.