Discovery Sagaサイレントキーワード俯瞰

We implemented DMPK (Diamond Matrix Powers Kernel) with parallelization of MPI and optimized assignment of tasks to processors. We also analyzed the amount of communication and redundant computation when using different number of phases and compared them to PA1 and PA2, which are the known methods of matrix powers kernel. These results been presented at HPCAsia2020 in Fukuoka.
Matrix Powers Kernel (MPK) algorithms calculate the vector Akx, obtained by multiplying an initial vector x with the k-th power of matrix A. Our algorithm, Diamond Matrix Powers Kernel (DMPK) generalizes the MPK algorithms PA1 and PA2 by Demmel et al. PA1 and PA2 can be used for general matrices. They improve performance by reducing the amount of communication, which is often the bottleneck, but they introduce redundant computations. In scientific computations with regular access patterns, diamond tiling algorithms achieve similar communication avoidance without introducing any redundant communication by introducing moving index domains. By combining these two approaches, DMPK, is applicable to general matrices and makes it possible to reduce the amount of redundant computation at the price of slightly higher amount of communication. This is done by translating the concept of moving index domains to general matrices: the algorithm is performed in “phases” and after each phase the graph (corresponding to the matrix) is repartitioned.