Abstract

In this paper, we examine the use of Joint Factor Analysis methods on RSR2015 part III (digits), [1]. A tied-mixture HMM is used for segmentation of the utterances into digits, while Joint Factor Analysis and a trainable backend are deployed for feature extraction and LLR calculation, respectively. A novel approach for digit-dependent fusion of UBMcomponent log-likelihood ratios is introduced, yielding the best results so far. The fusion of 5 different JFA features gives an equal-error rate of 3.6%, compared to 6.3% attained by the a baseline GMM-UBM model with score normalization. JFA for feature extraction JFA vs. i-vectors • The text-independent paradigm of i-vector/PLDA has not been successful in text-dependent speakerrecognition. The speaker-phrase variability is hard to be confined into a low-dimensional subspace. • JFA offers the flexibility of confining the channel effects in a subspace while allowing the speaker-phrace factors to lie on the supervector space, [2]. Main JFA equation S = m + Ux + V y + Dz (1) • The hidden variable x varies from one recording to another and is intended to model channel effects. • In text-independent speaker recognition, the term Dz is usually dropped and speakers are characterized by the low-dimensional vector y. Here, we extract either z or y features, [3]. JFA on utterances segmented into digits • JFA can be extended to utterances that are segmented into HMM states (digits). • Features can be global (digit-independent) or local (digit-dependent), supervectors-sized (z-vectors) or subspace (y-vectors). Segmentation and Baum-Welch stats Tied-Mixture HMM • Train a UBMand use its means and covariance matrices as codebook for a Tied-Mixture HMM (TMM) • The TMM has a single Gaussian codebook and digitdependent weights. • Very efficient for training and evaluating (Viterbi algorithm). • We use it also for extracting Baum-Welch stats for local features instead of the UBM. Training and evaluating the system Training the JFA and backend • Train a JFA model using both local and global features, z or y-vectors. (Several combinations are possible.) • Extract z or y-vectors, project them onto the unitsphere). • Train a Joint-Density Backend per feature. Evaluating the model • Apply Viterbi segmentation, extract z or y-vectors and use the JDB to calculate LLRs for each trial. • Apply score normalization and fuse score-normalized LLRs coming from multiple features. Joint-Density Backend An Alternative to PLDA • We model the joint-distribution of pairs of enrollment and test vectors under the same speaker hypothesis, [4]. • We use ”target” trials from the training set t = [ye , y T t ] T . • We estimate mean and covariance matrix (C). Assuming zero mean, C is as follows:

Keywords:
Speech recognition Computer science Segmentation Subspace topology Normalization (sociology) Feature vector Pattern recognition (psychology) Hidden Markov model Feature extraction Artificial intelligence

Metrics

11
Cited By
2.83
FWCI (Field Weighted Citation Impact)
16
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Text-Dependent Speaker Recognition With Random Digit Strings

Themos StafylakisMd. Jahangir AlamPatrick Kenny

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2016 Vol: 24 (7)Pages: 1194-1203
JOURNAL ARTICLE

Speaker Recognition With Random Digit Strings Using Uncertainty Normalized HMM-Based i-Vectors

Nooshin MaghsoodiHossein SametiHossein ZeinaliThemos Stafylakis

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2019 Vol: 27 (11)Pages: 1815-1825
JOURNAL ARTICLE

Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings

Shengyu YaoRuohua ZhouPengyuan Zhang

Journal:   IEICE Transactions on Information and Systems Year: 2019 Vol: E102.D (2)Pages: 346-354
© 2026 ScienceGate Book Chapters — All rights reserved.