Abstract: Large Vision-Language Models (LVLMs) have shown impressive capabilities across various domains, but existing LVLMs have limited performance in dense perception and structured learning ...