Publications

Stats

View publication

Title CAT: Cellular Automata on Tensor Cores
Authors Cristobal Navarro, Felipe Quezada, enzo meneses, Héctor Ferrada, Nancy Hitschfeld
Publication date December 2024
Abstract Cellular automata (CA) are simulation models that
can produce complex emergent behaviors from simple local rules.
Although state-of-the-art GPU solutions are already fast due to
their data-parallel nature, their performance can rapidly degrade
in CA with a large neighborhood radius. With the inclusion
of tensor cores across the entire GPU ecosystem, interest has
grown in finding ways to leverage these fast units outside the
field of artificial intelligence, which was their original purpose.
In this work, we present CAT, a GPU tensor core approach that
can accelerate CA in which the cell transition function acts on
a weighted summation of its neighborhood. CAT is evaluated
theoretically, using an extended PRAM cost model, as well as
empirically using the Larger Than Life (LTL) family of CA as
case studies. The results confirm that the cost model is accurate,
showing that CAT exhibits constant time throughout the entire
radius range 1 ≤ r ≤ 16, and its theoretical speedups agree
with the empirical results. At low radius r = 1, 2, CAT is
competitive and is only surpassed by the fastest state-of-the-
art GPU solution. Starting from r = 3, CAT progressively
outperforms all other approaches, reaching speedups of up to
101× over a GPU baseline and up to ~ 14× over the fastest state-
of-the-art GPU approach. In terms of energy efficiency, CAT is
competitive in the range 1 ≤ r ≤ 4 and from r ≥ 5 it is the most
energy efficient approach. As for performance scaling across GPU
architectures, CAT shows a promising trend that if continues for
future generations, it would increase its performance at a higher
rate than classical GPU solutions. A CPU version of CAT was
also explored, using the recently introduced AMX instructions.
Although its performance is still below GPU tensor cores, it
is a promising approach as it can still outperform some GPU
approaches at large radius. The results obtained in this work put
CAT as an approach with great potential for scientists that need
to study emerging phenomena on CA with large neighborhood
radius, both in GPU and CPU.
Pages 341-355
Volume 36
Journal name IEEE Transactions on Parallel and Distributed Systems
Publisher IEEE Computer Society Press (Los Alamitos, CA, USA)
Reference URL View reference page