Browsing by Author "d35decb1-a92d-41f0-8cf7-e6b74576855a"
Now showing items 1-1 of 1
-
A study of checkpointing in large scale training of deep neural networks
Rojas, Elvis; Kahira, Albert Njoroge; Meneses, Esteban; Bautista-Gomez, Leonardo; Badia, Rosa M (arXiv.Org, 2021-03-29)Deep learning (DL) applications are increasingly being deployed on HPC systems to leverage the massive parallelism and computing power of those systems. While significant effort has been put to facilitate distributed ...