Abstract:
This paper reports a parallel implementation of the Conjugate
Gradient (CG)
algorithm on EARTH multithreaded architecture. In our implementation,
the inter-phase and intra-phase communication costs are well balanced with
a two-dimensional blocking approach, and the overall communication cost,
which is the sum of two mentioned before, is minimized. The advantages of
the EARTH system, fine-grain and non-preemptive fibers, have been exploited
to overlap communication and computation, and to reduce the overhead of
synchronization operations.
Experiments of the algorithm have been done on SemiManna (a Simulator for
EARTH MANNA machine), and on ChibaCity cluster in ANL. Remarkable
improvements over traditional CG parallel implementation have been observed.
For example, with the NAS CG benchmark Class C, we achieved 41 times
relative speedup on 64 nodes ChibaCity cluster (with Ethernet connection).
While on MANNA machine, the simulated results show even better speedup
curves due to its higher communication to computation speed ratio.
Better speedup curves are to be expected on clusters with Myrinet connection
too.