We present an approach to automatic speaker verification through linguistically-constrained i-vector systems based on formant frequencies.
An analysis of discriminative and calibration properties is presented for every linguistic unit (phones and diphones).
An analysis of the best-performing units for different speakers reveals remarkable speaker-dependent specificities.
Different approaches for selection and fusion of different linguistic units are also analysed.
The fusion of a cepstral-based and formant-based systems obtain improved performance.