SPEECHSPLIT unofficial code demo



Reference paper:
Qian, K., Zhang, Y., Chang, S., Cox, D., & Hasegawa-Johnson, M. (2020). Unsupervised speech decomposition via triple information bottleneck. arXiv preprint arXiv:2004.11284.


Github code:
https://github.com/CODEJIN/SpeechSplit


Structure



VCTK 108 speaker version (106 trained, 2 unseen)


Result datasets

Speaker Category Source
VCTK P243 Consistent (Male)
VCTK P240 Consistent (Female)
VCTK P232 Trained(Male)
VCTK P277 Trained(Female)
VCTK P226 Unseen(Male)
VCTK P228 Unseen(Female)

Summary

Speaker Source Conversion
P243 P243
(Consistent)
P232
(Trained, male)
P277
(Trained, female)
P226
(Unseen, male)
P228
(Unseen, female)
P240 P240
(Consistent)
P232
(Trained, male)
P277
(Trained, female)
P226
(Unseen, male)
P228
(Unseen, female)

Combinations

Speaker Rhyme Content Pitch