Implementing Applications on a Cellular Architecture - the Mandelbrot-set.
Authors: Jason McGuiness, Colin Egan, Guang Gao

Abstract:


There is an ever widening gap between CPU speed and memory speed, resulting in a 'memory wall' where the time for memory accesses dominate performance.
Cellular architectures, such as the Cyclops family, have been developed to overcome this 'memory wall' by implementing processors-in-memory (PIM) on the
same chip. PIM architectures achieve high performance by increasing the bandwidth of processor-memory communication and reducing latency. In this paper we
introduce DIMES (the Delaware Iterative Multiprocessor Emulation System) which is being developed by CAPSL at the University of Delaware, as a hardware
validation tool for cellular architectures. The version of DIMES used in this paper is a simplified hardware implementation of the Cyclops-64 cellular
architecture developed at the IBM T. J. Watson Research Center. Since DIMES is a hardware validation tool, its hardware implementation is constrained to a
dual processor where each processor has four thread units.

DIMES memory is restricted to 16K of local scratch-pad memory per processor and 64K global shared memory. Additionally DIMES is linked to a host computer for I/O.We have chosen to use a Mandelbrot-set generator (written in C++) with a work-stealing algorithm as our metric to evaluate the programming model on DIMES.
The Mandelbrot-set generator has been threaded, and the work-stealing algorithm achieves load balancing between the DIMES' threads. The Mandelbrot example
demonstrates the effective use of DIMES' threads, the effective use of DIMES scratch-pad memory and the effective use DIMES global memory in its CRTS
environment. The results of the study are highly promising and show that DIMES is an ideal hardware tool for validating future Cyclops enhancements.