Bunny is a family of lightweight but powerful multimodal models. It offers multiple plug-and-play vision encoders, like EVA-CLIP, SigLIP and language backbones, including Llama-3-8B, Phi-3-mini, Phi-1 ...
😎 Tsinghua University, 🥳 Shanghai AI Laboratory (Correspondence: Jingbo Wang and Bo Dai). This work introduces MotionLCM, extending controllable motion generation to a real-time level. Existing ...