Reference papers
Parallel WaveGAN
Oord, A. V. D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., ... & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.
Yamamoto, R., Song, E., & Kim, J. M. (2020, May). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6199-6203). IEEE.
MelGAN embedding
Kumar, K., Kumar, R., de Boissiere, T., Gestin, L., Teoh, W. Z., Sotelo, J., ... & Courville, A. C. (2019). Melgan: Generative adversarial networks for conditional waveform synthesis. In Advances in Neural Information Processing Systems (pp. 14910-14921).
|
Original |
G:PWGAN, D: PWGAN SR:16K |
G:PWGAN, D: PWGAN SR:24K |
G:PWGAN, D: MelGAN SR:16K |
24K source trained |
|
|
|
|
22.05K source trained |
|
|
|
|
16K source unseen |
|
|
|
|
24K source unseen Non-English |
|
|
|
|