In the
audiovisual indexing context, we propose a system for automatic association of
voices and images. This association can be used as a preprocessing step for
existing applications like person identication systems. We use a fusion of
audio and video indexes (without any prior knowledge) in order to make the
information brought by each of them more robust. If both audio and video
indexes are correctly segmented, this automatic association yields excellent
results. In order to deal with oversegmentation, we propose an approach which
uses one index to improve the segmentation of the other. We show that the use
of the audio index improves an oversegmented video index on a corpus composed
of French TV broadcasts.