News
Recently, vision-language models based on transformers are gaining popularity for joint modeling of visual and textual modalities. In particular, they show impressive results when transferred to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results