Featured

ML for Audio Study Group - Text to Speech Deep Dive



Published
This week will do a deep dive into Text to Speech. You can ask your questions at https://discuss.huggingface.co/t/ml-for-audio-study-group-text-to-speech-deep-dive-jan-4/13315

- Join the discussion at Discord (http://hf.co/join/discord #ml-4-audio-study-group channel).
- Check out the GitHub repository of the project: https://github.com/Vaibhavs10/ml-with-audio

Vaibhav (VB) is a consultant turned student researcher at University of Stuttgart, Germany. His current research is in the field of Performance Prediction for NLP models and Speech Synthesis. He is also an active volunteer with Europython and Python DE.

Vatsal left the world of mathematics in 2017 to dive into Speech Synthesis soon after he came across the WaveNet paper. His research has focused on Normalising Flows, a particular kind of Deep Generative Model. At Amazon, he researched the deep-learning based vocoding module that is used in production, and disentanglement in deep generative models for zero-shot speech generation (text-to-speech & voice conversion): publishing 4 papers, 5 patents, and developing multiple product proof-of-concepts. Beyond speech, Vatsal has also spent some time in a team of researchers focused on Bayesian Models/Sparse Gaussian Processes

00:00 Intro
02:15 Text to Speech Intro
15:30 Tacotron 2
25:50 Code examples and finding models
31:40 Journey of Speech Synthesis
44:03 Questions
Category
Audio
Be the first to comment