News
Samba is a simple yet powerful hybrid model with an unlimited context length. Its architecture is frustratingly simple: Samba = Mamba + MLP + Sliding Window Attention + MLP stacking at the layer level ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results