Parallel Voice Conversion GAN demo



Papers

????????????

Reference papers

Duan, Z., Fang, H., Li, B., Sim, K. C., & Wang, Y. (2013, October). The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech. In 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (pp. 1-9). IEEE.

Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.

Yamamoto, R., Song, E., & Kim, J. M. (2020, May). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6199-6203). IEEE.

Deng, C., Yu, C., Lu, H., Weng, C., & Yu, D. (2020, May). Pitchnet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7749-7753). IEEE.

Nachmani, E., & Wolf, L. (2019). Unsupervised singing voice conversion. arXiv preprint arXiv:1904.06590.

Jung, S., & Kim, H. (2020). Pitchtron: Towards audiobook generation from ordinary people's voices. arXiv preprint arXiv:2005.10456.

Zhang, Y., Weiss, R. J., Zen, H., Wu, Y., Chen, Z., Skerry-Ryan, R. J., ... & Ramabhadran, B. (2019). Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning. arXiv preprint arXiv:1907.04448.



Github code:
https://github.com/CODEJIN/PVCGAN_Torch



Structure



Results

  To
ADIZ JLEE JTAN KENN MCUR MPOL MPUR NJAT PMAR SAMF VKOM ZHIY
From ADIZ
JLEE
JTAN
KENN
MCUR
MPOL
MPUR
NJAT
PMAR
SAMF
VKOW
ZHIY