Abstract: Benefiting from convolutional neural networks (CNNs), monocular depth estimation has achieved remarkable performance. However, most CNNs focus on learning local details of the scene, ...
Abstract: Vision Transformers (ViT) and Visual Prompt Tuning (VPT) achieve state-of-the-art performance with improved efficiency in various computer vision tasks. This suggests a promising paradigm ...