🔧 Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi
Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to
Introduction
In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to:
- Capture voice input through a web interface.
- Process the text using a local LLM (like Mistral) running on the Pi.
- Generate voice responses using Piper for text-to-speech (TTS).
- Stream everything in real-time via WebSockets.
All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step!
1. Setting up the Raspberry Pi
First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker.
Steps:
- Update the system:
sudo apt-get update
sudo apt-get upgrade
- Enable the audio interface:
sudo raspi-config
Navigate to System Options > Audio and select the correct output/input device.
2. Installing Ollama for Local LLMs
Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using:
curl -fsSL https://ollama.com/install.sh | sh
Once installed, I pulled the Mistral model:
ollama pull mistral
To confirm it works, I ran a quick test:
ollama run mistral
The model was ready to process text right on the Pi!
3. Setting up Piper for Text-to-Speech (TTS)
For offline voice generation, I chose Piper — a fantastic open-source TTS engine.
- Install dependencies:
sudo apt-get install wget build-essential libsndfile1
- Download Piper for ARM64 (Raspberry Pi):
wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz
tar -xvzf piper_arm64.tar.gz
chmod +x piper
sudo mv piper /usr/local/bin/
- Test if Piper works:
echo "Hello, world!" | piper --model en_US --output_file output.wav
aplay output.wav
Now the Pi could "talk" back!
4. Creating the Backend (Node.js)
I built a simple Node.js server to:
- Accept text from the client (voice input from a web app).
- Process it using Mistral (via Ollama).
- Convert the LLM response to speech with Piper.
- Stream the audio back to the client.
server.js:
const express = require('express');
const { exec } = require('child_process');
const WebSocket = require('ws');
const app = express();
const PORT = 3001;
// WebSocket setup
const wss = new WebSocket.Server({ port: 3002 });
wss.on('connection', (ws) => {
console.log('Client connected');
ws.on('message', (message) => {
console.log('Received:', message);
// Run Mistral LLM
exec(`ollama run mistral "${message}"`, (err, stdout) => {
if (err) {
console.error('LLM error:', err);
ws.send('Error processing your request.');
return;
}
// Convert LLM response to speech using Piper
exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => {
if (ttsErr) {
console.error('Piper error:', ttsErr);
ws.send('Error generating speech.');
return;
}
// Send the audio file back to the client
ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' }));
});
});
});
});
app.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`);
});
5. Building the Real-Time Web Interface (React)
For the frontend, I created a simple React app to:
- Record voice input.
- Display real-time text responses.
- Play the generated speech audio.
App.js:
import React, { useState } from 'react';
function App() {
const [text, setText] = useState('');
const [response, setResponse] = useState('');
const [audio, setAudio] = useState(null);
const ws = new WebSocket('ws://localhost:3002');
const handleSend = () => {
ws.send(text);
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
setResponse(data.text);
fetch(`http://localhost:3001/${data.audio}`)
.then(res => res.blob())
.then(blob => {
setAudio(URL.createObjectURL(blob));
});
};
return (
<div>
<h1>Voice Assistant</h1>
<textarea value={text} onChange={(e) => setText(e.target.value)} />
<button onClick={handleSend}>Send</button>
<h2>Response:</h2>
<p>{response}</p>
{audio && <audio controls src={audio} />}
</div>
);
}
export default App;
6. Running the Project
Once the backend and frontend were ready, I launched both:
- Start the backend:
node server.js
- Run the React app:
npm start
I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally.
Conclusion
Building a real-time, fully offline voice assistant on a Raspberry Pi was an exciting challenge. With:
- Ollama for running local LLMs (like Mistral)
- Piper for high-quality text-to-speech
- WebSockets for real-time communication
- React for a smooth web interface
... I now have a personalized voice AI that works without relying on the cloud.
...
🔧 Voice AI: How to build a voice AI assistant?
📈 24.22 Punkte
🔧 Programmierung
🔧 What are LLMs, Local LLMs and RAG?
📈 22.96 Punkte
🔧 Programmierung
🐧 Building a Voice Activated Digital Assistant
📈 20.88 Punkte
🐧 Linux Tipps