Abstract: Audio-visual event localization (AVEL) aims to identify both the categories and temporal boundaries of events that are both audible and visible in unconstrained videos. However, the inherent ...
Welcome to the companion repository for our position paper on Music Performance Audio-Visual Question Answering (Music AVQA). This repo curates the datasets, benchmark results, and seminal methods ...