๐ HPR3219: Linux Inlaws S01E18: Voice Recognition and Text to Speech
๐ก Newskategorie: Podcasts
๐ Quelle: hackerpublicradio.org
In this episode, Chris is harassed by quite a few artificial nuisance callers, among drug lords, Irish nurses and some random Linux Inlaws Chief Financial Officer. Based on these examples, our two heroes discuss the history and current state of text-to- speech (TTS) and voice recognition. We attempted to use voice recognition software in order to produce a transcript of the show.
Shownotes:
- Wavenet: https://deepmind.com/blog/article/wavenet-generative-model-raw-audio
- Tacotron: https://ai.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html
- DeepSpeech: https://github.com/mozilla/DeepSpeech
- Lyrebird / Welcome.AI: https://www.welcome.ai/lyrebird
- Nvidia Tacotron 2: https://github.com/NVIDIA/tacotron2
- Tensorflow: https://www.tensorflow.org
- PyTorch: https://pytorch.org
- Melspectrograms: https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53
- GRAPHCORE: https://www.graphcore.ai
- FGPA: https://en.wikipedia.org/wiki/Field-programmable_gate_array
- IBM ROMP: https://en.wikipedia.org/wiki/IBM_ROMP
- Google's TTS: https://cloud.google.com/text-to-speech
- Apple M1: https://www.gsmarena.com/the_apple_m1_is_the_first_armbased_chipset_for_macs_with_the_fastest_cpu_cores_and_top_igpu-news-46222.php
- Secure Enclaves: https://support.apple.com/guide/security/secure-enclave-overview-sec59b0b31ff/web
- OSDU: https://www.opengroup.org/osdu/forum-homepage
- Jack Kerouac's On the Road: https://en.wikipedia.org/wiki/On_the_Road