In the past
few years, discriminative approaches to perform speaker detection have shown
good results and an increasing interest. Among these methods, SVM based systems
have lots of advantages, especially their ability to deal with a high dimension
feature space. Generative systems such as UBMGMM systems show the greatest
performance among other systems in speaker verification tasks. Combination of
generative and discriminative approaches is not a new idea and has been studied
several times by mapping a whole speech utterance onto a fixed length vector.
This paper presents a straight-forward, cost friendly method to combine the two
approaches with the use of a UBM model only to drive the experiment. We show
that the use of the TFLLR kernel, while closely related to a reduced form of
the Fisher mapping, implies a performance that is close to a standard
GMM/UBM based speaker detection system. Moreover, we show that a combination of
both outperforms the systems taken independently.