OPTIMIZING PERFORMANCE ON MASSIVELY PARALLEL COMPUTERS USING A REMOTE MEMORY ACCESS PROGRAMMING MODEL
MetadataShow full item record
Parallel programming models are of paramount importance because they affect both the performance delivered by massively parallel systems and the productivity of the programmer seeking that performance. Advancements in networks, multicore chips, and related technology continue to improve the efficiency of modern supercomputers. However, the average application efficiency is a small fraction of the peak system efficiency.This research proposes techniques for optimizing application performance on supercomputers using remote memory access (RMA)parallel programming model. The growing gaps between CPU-network and CPU-memory timescales are fundamental problems that require attention in the design of communication models as well as scalable parallel algorithms. This research validates the RMA model because of its simplicity, its good hardware support on modern networks, and its posession of certain characteristics important for reducing the performance gap between system peak and application performance.The effectiveness of these optimizations is evaluated in the contextof parallel linear algebra kernels. The current approach differs fromthe other linear algebra algorithms by the explicit use of sharedmemory and remote memory access communication rather than message passing. It is suitable for clusters and scalable shared memorysystems. The experimental results on large scale systems(Linux-Infiniband cluster, Cray XT) demonstrate consistent performanceadvantages over the ScaLAPACK suite, the leading implementation ofparallel linear algebra algorithms used today.