Implementing Applications on a Cellular Architecture - the
Mandelbrot-set.
Authors: Jason McGuiness, Colin Egan, Guang Gao
Abstract:
There is an ever widening gap between CPU speed and memory
speed, resulting in a 'memory wall' where the time for memory accesses dominate
performance.
Cellular architectures, such as the Cyclops family, have been developed to overcome
this 'memory wall' by implementing processors-in-memory (PIM) on the
same chip. PIM architectures achieve high performance by increasing the bandwidth
of processor-memory communication and reducing latency. In this paper we
introduce DIMES (the Delaware Iterative Multiprocessor Emulation System) which
is being developed by CAPSL at the University of Delaware, as a hardware
validation tool for cellular architectures. The version of DIMES used in this
paper is a simplified hardware implementation of the Cyclops-64 cellular
architecture developed at the IBM T. J. Watson Research Center. Since DIMES
is a hardware validation tool, its hardware implementation is constrained to
a
dual processor where each processor has four thread units.
DIMES memory is restricted to 16K of local scratch-pad
memory per processor and 64K global shared memory. Additionally DIMES is linked
to a host computer for I/O.We have chosen to use a Mandelbrot-set generator
(written in C++) with a work-stealing algorithm as our metric to evaluate the
programming model on DIMES.
The Mandelbrot-set generator has been threaded, and the work-stealing algorithm
achieves load balancing between the DIMES' threads. The Mandelbrot example
demonstrates the effective use of DIMES' threads, the effective use of DIMES
scratch-pad memory and the effective use DIMES global memory in its CRTS
environment. The results of the study are highly promising and show that DIMES
is an ideal hardware tool for validating future Cyclops enhancements.