Abstract: Recently, transformer-based large language models (LLMs), shown in Fig. 20.5.1, are widely used, and even on-device LLM systems with real-time responses are anticipated [1]. Many transformer ...