Abstract: Video question answering (Video-QA) has emerged as a core task in the vision-language domain, which requires the models to understand a given video and answer textual questions related to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results