Abstract: In Large Language Model (LLM) training, acti-vations constitute a significant portion of memory usage, and memory-side errors occurring in activations can ...