Running large language models on your desktop depends as much on your accuracy needs as your GPU, and the key to performance is fitting the model into video memory. Recently, I have been doing a lot ...