Abstract: Audio-visual event localization (AVEL) aims to identify both the categories and temporal boundaries of events that are both audible and visible in unconstrained videos. However, the inherent ...
Abstract: Currently, audio-visual speech separation methods utilize the speaker's audio and visual correlation information to help separate the speech of the target speaker. However, these methods ...