Convolutional neural networks have transformed the landscape of audio classification by leveraging advanced feature extraction techniques. In this research, various spectral and rhythm feature representations are evaluated using a deep CNN on the ESC-50 dataset, which consists of 2,000 labeled environmental audio recordings. The findings reveal that traditional features like mel-scaled spectrograms and mel-frequency cepstral coefficients (MFCCs) yield superior performance compared to other investigated features such as cyclic tempograms and chroma energy normalized statistics.
The implications of this research are significant for enhancing audio recognition systems across various applications, including environmental monitoring and automated sound classification. By optimizing feature selection, developers can build more robust systems that improve the accuracy and reliability of audio classification tasks. However, the reliance on specific features also raises concerns about potential limitations when encountering diverse audio data outside of the evaluated dataset.
👉 Pročitaj original: arXiv AI Papers