Abstract
The era of the digital technology uses recorded speech signals for the applications such as audio broadcasting, cell phones, and the television. The speech signals recorded from the conference hall and meetings through various speakers reduces the quality of the speech signals. The speaker tracking (diarization) system allows the efficient identification of the speaker from the recorded speech signal. In this work, three novel speaker diarization systems have been proposed for improving the speaker clustering and segmentation. In the first work, the Tangent weighted Mel-Frequency Cepstral Coefficient (TMFCC) with the Lion algorithm has been proposed. In the second work, the Multiple Kernel weighted Mel-Frequency Cepstral Coefficient (MKMFCC) with WLI-Fuzzy clustering has been proposed. The third work proposes Holo-entropy with the eXtended Linear Prediction using autocorrelation Snapshot (HXLPS) model with Deep Neural Network (DNN). The final proposed model HXLPS with DNN performance analysis is obtained by comparing it with the existing XLPS with DNN model. The proposed model implementation is done by recording the speech signal from the three speakers, four speakers, five speakers, six speakers and seven speakers system. The performance analysis is done by varying the frame length and the lambda (λ) values of the speech signal. The performance analysis of the proposed model is evaluated by using the various performance metrics such as tracking distance, tracking accuracy, precision, recall, F-Measure, false alarm rate and the diarization error rate. The simulation results show that the proposed TMFCC with Lion model has better performance than the existing models such as MFCC with ILP. The proposed MKMFCC with WLI Fuzzy has better average results than the TMFCC with Lion model. The HXLPS with DNN model outperforms the TMFCC with Lion and the MKMFCC with WLI Fuzzy models with better speaker identification. The proposed HXLPS with DNN model has the best overall performance with the average values of 0.63332, 0.886, 0.886, 0.87, 0.87, 0.20708, 0.08322 for parameters such as tracking distance, tracking accuracy, precision, recall, F-Measure, false alarm rate and the diarization error rate respectively for the various lambda (λ) values. For various frame length values, the proposed HXLPS with DNN model has the best overall performance rate with the average values of 0.57058, 0.918, 0.902, 0.8938, 0.87524, 0.17698, 0.0848 for the evaluation parameters tracking distance, tracking accuracy, precision, recall, F-Measure, false alarm rate and the diarization error rate respectively