Cookie Consent by Free Privacy Policy Generator 📌 Text to speech generation

🏠 Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeiträge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden Überblick über die wichtigsten Aspekte der IT-Sicherheit in einer sich ständig verändernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch übersetzen, erst Englisch auswählen dann wieder Deutsch!

Google Android Playstore Download Button für Team IT Security



📚 Text to speech generation


💡 Newskategorie: Programmierung
🔗 Quelle: dev.to

This article is part of a tutorial series on txtai, an AI-powered semantic search platform.

Text To Speech (TTS) models have made great strides in quality over the last few years. Unfortunately, it's not currently possible to use these libraries without installing a large number of dependencies.

The txtai TextToSpeech pipeline has the following objectives:

  • Fast performance both on CPU and GPU
  • Ability to batch large text values and stream it through the model
  • Minimal install footprint
  • All dependencies must be Apache 2.0 compatible

This article will go through a set of text to speech generation examples.

Install dependencies

Install txtai and all dependencies.

# Install txtai
pip install txtai[pipeline] onnxruntime-gpu librosa

Create a TextToSpeech instance

The TextToSpeech instance is the main entrypoint for generating speech from text. The pipeline is backed by models from the ESPnet project. ESPnet has a number of high quality TTS models available on the Hugging Face Hub.

There are currently two models on the Hugging Face Hub that this pipeline can use.

The default model is ljspeech-jets-onnx. Each of the models above are ESPnet models exported to ONNX using espnet_onnx. More on that process can be found in the links above.

from txtai.pipeline import TextToSpeech

# Create text-to-speech model
tts = TextToSpeech()

Generate speech

The first example shows how to generate speech from text. Let's give it a try!

import librosa.display
import matplotlib.pyplot as plt

text = "Text To Speech models have made great strides in quality over the last few years."

# Generate raw waveform speech
speech, rate = tts(text), 22050

# Print waveplot
plt.figure(figsize=(15, 5))
plot = librosa.display.waveplot(speech, sr=rate)

Image description

The graph shows a plot of the audio. It clearly shows pauses between words and sentences as we would expect in spoken language. Now let's play the generated speech.

from IPython.display import Audio, display

import os

import soundfile as sf

def play(speech):
  # Convert to MP3 to save space
  sf.write("speech.wav", speech, 22050)
  !ffmpeg -i speech.wav -y -b:a 64 speech.mp3 2> /dev/null

  # Play speech
  display(Audio(filename="speech.mp3"))

play(speech)

Transcribe audio back to text

Next we'll use OpenAI Whisper to transcribe the generated audio back to text.

from txtai.pipeline import Transcription

# Transcribe files
transcribe = Transcription("openai/whisper-base")

# Print result
transcribe(speech, rate)
Text to speech models have made great strides in quality over the last few years.

And as expected, the transcription matches the original text.

Audio books

The TextToSpeech pipeline is designed to work with large blocks of text. It could be used to build audio for entire chapters of books.

In the next example below, we'll read the beginning of the book the Great Gatsby.

# Beginning of The Great Gatsby from Project Gutenberg
# https://www.gutenberg.org/ebooks/64317

text = """
In my younger and more vulnerable years my father gave me some advice
that I've been turning over in my mind ever since.

“Whenever you feel like criticizing anyone,” he told me, “just
remember that all the people in this world haven't had the advantages
that you've had.”

He didn't say any more, but we've always been unusually communicative
in a reserved way, and I understood that he meant a great deal more
than that.
"""

speech = tts(text)
play(speech)

Text To Speech Workflow

In the last example, we'll cover building a text-to-speech workflow. txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. This workflow is no different in that it connects multiple pipelines together, each of which are backed by machine learning models.

The workflow extracts text from a webpage, summarizes it and then generates audio of the summary.

summary:
  path: sshleifer/distilbart-cnn-12-6

textractor:
  join: true
  lines: false
  minlength: 100
  paragraphs: true
  sentences: false

texttospeech:

workflow:
  tts:
    tasks:
    - action: textractor
      task: url
    - action: summary
    - action: texttospeech
from txtai.app import Application

app = Application("workflow.yml")

speech = list(app.workflow("tts", ["https://en.wikipedia.org/wiki/Natural_language_processing"]))[0]

play(speech)

Wrapping up

This article gave a brief introduction on text to speech models. The text to speech pipeline in txtai is designed to be easy to use and handles the most common text to speech tasks in English.

This work is made possible by the excellent advancements in text to speech modeling. ESPnet is a great project and should be checked out for more advanced and a wider range of use cases. This pipeline was also made possible by the great work from espnet_onnx in building a framework to export models to ONNX.

Looking forward to seeing what the community dreams up using this pipeline!

...



📌 AI Show Live - Episode 19 - Improving customer experiences with Speech to Text and Text to Speech


📈 40.01 Punkte

📌 Speech to Text (Google Cloud Speech API)


📈 31.9 Punkte

📌 Speech to Text to Speech with AI Using Python — a How-To Guide


📈 31.9 Punkte

📌 How to Perform Speech-to-Text and Translate Any Speech to English With OpenAI’s Whisper


📈 31.9 Punkte

📌 CMU Researchers Unveil An AI System for Human-like Text-to-Speech Training with Diverse Speech


📈 31.9 Punkte

📌 Speech Central 13.1.4 - Text-to-speech suite.


📈 31.9 Punkte

📌 Speech 1.11.0 - Intuitive text-to-speech app.


📈 31.9 Punkte

📌 Are there any speech dispatcher engines (text-to-speech) that don't suck?


📈 31.9 Punkte

📌 'Seamless' voice-to-text feature for Android will let you turn speech into text instantly


📈 28.12 Punkte

📌 Text to speech generation


📈 27.65 Punkte

📌 Plain Text Editor 1.2.1 - Simple distraction-free text editor without any rich text nonsense.


📈 24.33 Punkte

📌 On Avoiding Conflation of Political Speech and Hate Speech


📈 23.79 Punkte

📌 Universal Speech Model (USM): State-of-the-art speech AI for 100+ languages


📈 23.79 Punkte

📌 Towards Real-World Streaming Speech Translation for Code-Switched Speech


📈 23.79 Punkte

📌 “Free Speech Extremist” Elon Musk Begs Courts to Protect Him From Speech


📈 23.79 Punkte

📌 New Free Speech Site Gets in a Tangle Over<nobr> <wbr></nobr>... Free Speech


📈 23.79 Punkte

📌 Google AI Research Present Translatotron 3: A Novel Unsupervised Speech-to-Speech Translation Architecture


📈 23.79 Punkte

📌 Text to Speech in Linux and Fun BASH Scripts - Hak5 1923


📈 20 Punkte

📌 Balabolka: Kostenloses Text-to-Speech-Programm


📈 20 Punkte

📌 Text to Speech in Linux and Fun BASH Scripts - Hak5 1923


📈 20 Punkte

📌 Balabolka: Kostenloses Text-to-Speech-Programm


📈 20 Punkte

📌 Tacotron: Google will Text-to-Speech-Modelle revolutionieren


📈 20 Punkte

📌 Massachusetts Teenager Provokes Suicide With A Text. Free Speech?


📈 20 Punkte

📌 Google Deepmind: Cloud Text-to-Speech liest Texte mit fast natürlichem Klang


📈 20 Punkte

📌 TensorRT & TensorFlow 1.7, Android Studio 3.1, Google Cloud Text-to-Speech & More! - TL;DR 106


📈 20 Punkte

📌 Is there any good speech to text software?


📈 20 Punkte

📌 Google überlistet sich selbst: Recaptcha mit Speech-to-Text geknackt


📈 20 Punkte

📌 Google überlistet sich selbst: Recaptcha mit Speech-to-Text geknackt


📈 20 Punkte











matomo