In the audiovisual indexing context, we propose a system for automatic association of voices and images. This association can be used as a preprocessing step for existing applications like person identication systems. We use a fusion of audio and video indexes (without any prior knowledge) in order to make the information brought by each of them more robust. If both audio and video indexes are correctly segmented, this automatic association yields excellent results. In order to deal with oversegmentation, we propose an approach which uses one index to improve the segmentation of the other. We show that the use of the audio index improves an oversegmented video index on a corpus composed of French TV broadcasts.