Abstract: Modeling human-object interaction for human motion prediction presents significant challenges due to limited data availability. While pre-training on large-scale datasets offers a promising ...
Abstract: The remarkable performance of Transformer architectures in the field of Natural Language Processing (NLP) has been extended to 2D computer vision, achieving tremendous success in various ...