ABSTRACT: Video-text retrieval (VTR) is an essential task in multimodal learning, aiming to bridge the semantic gap between visual and textual data. Effective video frame sampling plays a crucial role ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results