A Python tool for measuring and comparing whether sin(x) and cos(x) are implemented as standard (≤2 ULP guaranteed) or as approximate “fast-math” versions across multiple GPU backends (Vulkan, D3D12, ...
Vulkan v1.50.2 uses VRAM to load models as expected. When loading gpt-oss 120b, I can clearly see that 60G of VRAM gets allocated, plus I can see that the GPU load is ~50% when the text is being ...