135

Speaker diarization is often performed as a first step to speaker or speech recognition systems, which work better when the input signal is split into its speakers. When performing speaker diarization, it is common to use an agglomerative clustering approach in which the acoustic data is first split in small pieces and then pairs are merged until reaching a stopping point. The speaker clusters often contain non-speech frames that jeopardize discrimination between speakers, creating problems when deciding which two clusters to merge and when to stop the clustering. In this paper, we present one algorithm that aims to purify the clusters, eliminating the non-discriminant frames –selected using a likelihood-based metric– when comparing two clusters. We show improvements of over 15.5% relative using three datasets from the most current Rich Transcription (RT) evaluations.