An Improved Approach to Audio Segmentation and Classification in Broadcasting Industries

View Sample PDF

Author(s): Jingzhou Sun (School of Computer Science and Cybersecurity, Communication of China, Beijing, China)and Yongbin Wang (School of Computer Science and Cybersecurity, Communication of China, Beijing, China)
Copyright: 2019
Volume: 30
Issue: 2
Pages: 23
Source title: Journal of Database Management (JDM)
Editor(s)-in-Chief: Keng Siau (City University of Hong Kong, Hong Kong SAR)
DOI: 10.4018/JDM.2019040103

Keywords: Data Mining and Databases / Databases / Information Science Reference / Library & Information Science

Purchase

View An Improved Approach to Audio Segmentation and Classification in Broadcasting Industries on the publisher's website for pricing and purchasing information.

Abstract

Audio segmentation and classification are the basis of audio processing in broadcasting industries. A Dual-CNN (Dual-Convolutional Neural Network) method is proposed in this article in which it is possible to pre-train a CNN with unlabeled audio data so as to deal with the scarcity of labeled data. Auto-encoders (including an encoder and a decoder) are utilized, thus the name “Dual.” In the first place, audio sampling points and the derived STFT (Short-Time Fourier Transform) spectrograms pass through their own CNNs. Fusion of the extracted features is then performed. Finally, the merged features are sent to a fully connected network and the classification results are produced via Softmax. Being one of the segmentation-by-classification approaches, our solution also presents a novel smoothing method (SEG-smoothing) in order to deliver the best result of segmentation. A series of experiments have been conducted and their result verifies that the proposed approach for segmentation and classification outperforms alternative solutions.

The IRMA Community

Research IRM

An Improved Approach to Audio Segmentation and Classification in Broadcasting Industries

Purchase

Abstract

Related Content

IRMA Sponsors