Audio event classification refers to recognition tasks involving the assignment of one or several labels, such as ‘dog bark’ or ‘doorbell’, to a particular audio signal. The subtitles for this demo, which features different genres of music, is generated by a deep learning model following the Google AudioSet ontology. These subtitles are updated for each second based on the window length around 2-3 seconds.
The subtitles for this second demo, which features general audio events, is also generated by a deep learning model following the Google AudioSet ontology. These subtitles are updated for each second based on the window length around 2-3 seconds.