How to implement text to speech in SwiftUI
Create an AVSpeechSynthesizer, wrap your text in an AVSpeechUtterance, then call synthesizer.speak(_:). Store the synthesizer as a long-lived object so it isn't deallocated mid-speech.
import AVFoundation
struct TTSView: View {
private let synthesizer = AVSpeechSynthesizer()
@State private var text = "Hello from SwiftUI!"
var body: some View {
VStack(spacing: 16) {
TextField("Enter text", text: $text)
.textFieldStyle(.roundedBorder)
Button("Speak") {
let utterance = AVSpeechUtterance(string: text)
utterance.rate = AVSpeechUtteranceDefaultSpeechRate
synthesizer.speak(utterance)
}
}
.padding()
}
}
Full implementation
The complete solution wraps AVSpeechSynthesizer inside an @Observable class so SwiftUI can react to speaking state changes. We conform to AVSpeechSynthesizerDelegate to track when speech starts, pauses, and finishes — letting us drive UI like a progress indicator or a dynamic play/pause button. Rate, pitch, and voice language are all exposed as adjustable properties.
import SwiftUI
import AVFoundation
// MARK: - ViewModel
@Observable
final class SpeechViewModel: NSObject, AVSpeechSynthesizerDelegate {
private let synthesizer = AVSpeechSynthesizer()
var isSpeaking = false
var isPaused = false
var rate: Float = AVSpeechUtteranceDefaultSpeechRate // 0.0 – 1.0
var pitch: Float = 1.0 // 0.5 – 2.0
var languageCode = "en-US"
override init() {
super.init()
synthesizer.delegate = self
}
func speak(_ text: String) {
guard !text.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else { return }
if synthesizer.isSpeaking { synthesizer.stopSpeaking(at: .immediate) }
let utterance = AVSpeechUtterance(string: text)
utterance.rate = rate
utterance.pitchMultiplier = pitch
utterance.volume = 1.0
utterance.voice = AVSpeechSynthesisVoice(language: languageCode)
synthesizer.speak(utterance)
}
func togglePause() {
if synthesizer.isPaused {
synthesizer.continueSpeaking()
} else {
synthesizer.pauseSpeaking(at: .word)
}
}
func stop() {
synthesizer.stopSpeaking(at: .immediate)
}
// MARK: AVSpeechSynthesizerDelegate
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
didStart utterance: AVSpeechUtterance) {
isSpeaking = true
isPaused = false
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
didPause utterance: AVSpeechUtterance) {
isPaused = true
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
didContinue utterance: AVSpeechUtterance) {
isPaused = false
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
didFinish utterance: AVSpeechUtterance) {
isSpeaking = false
isPaused = false
}
}
// MARK: - View
struct TextToSpeechView: View {
@State private var vm = SpeechViewModel()
@State private var inputText = "SwiftUI makes text to speech surprisingly approachable."
var body: some View {
NavigationStack {
Form {
Section("Text") {
TextEditor(text: $inputText)
.frame(minHeight: 100)
.accessibilityLabel("Text to speak")
}
Section("Voice settings") {
LabeledContent("Rate") {
Slider(value: $vm.rate, in: AVSpeechUtteranceMinimumSpeechRate...AVSpeechUtteranceMaximumSpeechRate)
.accessibilityLabel("Speech rate")
}
LabeledContent("Pitch") {
Slider(value: $vm.pitch, in: 0.5...2.0)
.accessibilityLabel("Pitch multiplier")
}
Picker("Language", selection: $vm.languageCode) {
Text("English (US)").tag("en-US")
Text("English (GB)").tag("en-GB")
Text("Spanish").tag("es-ES")
Text("French").tag("fr-FR")
Text("Japanese").tag("ja-JP")
}
}
Section {
HStack(spacing: 16) {
Button {
vm.speak(inputText)
} label: {
Label("Speak", systemImage: "play.fill")
.frame(maxWidth: .infinity)
}
.buttonStyle(.borderedProminent)
.disabled(vm.isSpeaking && !vm.isPaused)
Button {
vm.togglePause()
} label: {
Label(vm.isPaused ? "Resume" : "Pause",
systemImage: vm.isPaused ? "play.circle" : "pause.circle")
.frame(maxWidth: .infinity)
}
.buttonStyle(.bordered)
.disabled(!vm.isSpeaking)
Button(role: .destructive) {
vm.stop()
} label: {
Label("Stop", systemImage: "stop.fill")
.frame(maxWidth: .infinity)
}
.buttonStyle(.bordered)
.disabled(!vm.isSpeaking)
}
}
}
.navigationTitle("Text to Speech")
.overlay(alignment: .top) {
if vm.isSpeaking {
Label(vm.isPaused ? "Paused" : "Speaking…",
systemImage: vm.isPaused ? "pause.circle.fill" : "waveform")
.font(.footnote)
.padding(.horizontal, 12)
.padding(.vertical, 6)
.background(.thinMaterial, in: Capsule())
.padding(.top, 8)
.transition(.move(edge: .top).combined(with: .opacity))
.animation(.spring, value: vm.isSpeaking)
}
}
}
}
}
#Preview {
TextToSpeechView()
}
How it works
-
@Observable SpeechViewModel — Marking the class
@Observable(iOS 17 Observation framework) means any view readingisSpeakingorisPausedautomatically re-renders when those properties change, without manually publishing each one. -
AVSpeechSynthesizerDelegate — The
didStart,didPause,didContinue, anddidFinishcallbacks are the only reliable way to track speech state. Settingsynthesizer.delegate = selfininit()wires this up. -
AVSpeechUtterance configuration — Each call to
speak(_:)creates a fresh utterance. Rate uses the system constantAVSpeechUtteranceDefaultSpeechRate(0.5) as a baseline;pitchMultiplierdefaults to 1.0 and accepts values from 0.5 (lower) to 2.0 (higher). -
AVSpeechSynthesisVoice(language:) — Passing a BCP-47 language tag like
"en-US"lets the system pick the best available voice for that locale. On-device enhanced voices download automatically on iOS 17+. -
Status pill overlay — The
.overlay(alignment: .top)combined with.transitionand.animation(.spring, value: vm.isSpeaking)gives non-intrusive feedback that disappears the moment speech ends.
Variants
Highlight spoken words in real time
Use the willSpeakRangeOfSpeechString delegate method to get the character range of the word currently being spoken, then highlight it in a SwiftUI Text view using AttributedString.
// Inside SpeechViewModel — add a published range
var highlightedRange: Range<String.Index>? = nil
func speechSynthesizer(
_ synthesizer: AVSpeechSynthesizer,
willSpeakRangeOfSpeechString characterRange: NSRange,
utterance: AVSpeechUtterance
) {
let str = utterance.speechString
guard let range = Range(characterRange, in: str) else { return }
highlightedRange = range
}
func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
didFinish utterance: AVSpeechUtterance) {
isSpeaking = false
isPaused = false
highlightedRange = nil
}
// In your View — build a highlighted AttributedString
func attributed(_ source: String) -> AttributedString {
var att = AttributedString(source)
if let range = vm.highlightedRange,
let attRange = Range(range, in: att) {
att[attRange].backgroundColor = .yellow
att[attRange].foregroundColor = .black
}
return att
}
// Render it
Text(attributed(inputText))
.animation(.linear(duration: 0.05), value: vm.highlightedRange?.lowerBound)
List all available voices on device
Call AVSpeechSynthesisVoice.speechVoices() to enumerate every installed voice. Filter by voice.quality == .enhanced or .premium (iOS 17 added .premium) to surface the highest-quality options. Use voice.identifier instead of a language code when you want a specific named voice like Siri Voice 2.
Common pitfalls
-
⚠️ Deallocated synthesizer: If you create
AVSpeechSynthesizer()as a local variable inside a button action, ARC will destroy it before it finishes speaking. Always store it in a long-lived owner — an@Observableclass,@StateObject, or@Stateheld at the view level. -
⚠️ Audio session conflicts: On iOS 17+ the system manages the audio session for
AVSpeechSynthesizerautomatically, but if your app also plays music or records audio, calltry AVAudioSession.sharedInstance().setCategory(.playback, mode: .spokenAudio)before speaking to avoid interruption issues. -
⚠️ VoiceOver interference: When VoiceOver is active,
AVSpeechSynthesizermay be interrupted or queued behind VoiceOver announcements. Test with VoiceOver on a real device and consider usingUIAccessibility.post(notification: .announcement, argument:)for short accessibility-only strings instead. -
⚠️ Rate clamping:
AVSpeechUtteranceMinimumSpeechRateis 0.0 andAVSpeechUtteranceMaximumSpeechRateis 1.0 — values outside this range are silently clamped. The "default" constant is 0.5, which is already a comfortable reading pace for most text.
Prompt this with Claude Code
When using Soarias or Claude Code directly to implement this:
Implement text to speech in SwiftUI for iOS 17+. Use AVSpeechSynthesizer, AVSpeechUtterance, and AVSpeechSynthesisVoice. Wrap the synthesizer in an @Observable ViewModel that exposes isSpeaking and isPaused state, and conforms to AVSpeechSynthesizerDelegate. Include rate and pitch Slider controls and a language Picker. Make it accessible (VoiceOver labels on all controls). Add a #Preview with realistic sample data.
Drop this prompt into the Soarias Build phase after your screens are scaffolded — it will generate the ViewModel, wire up the delegate, and keep your SwiftUI layer thin.
Related
FAQ
Does this work on iOS 16?
The core AVSpeechSynthesizer APIs have been available since iOS 7. However, the @Observable macro used in the ViewModel requires iOS 17. For iOS 16 support, swap @Observable for ObservableObject with @Published properties and use @StateObject in the view. The synthesizer itself will work identically.
How do I use a specific high-quality Siri voice instead of a generic locale voice?
Call AVSpeechSynthesisVoice.speechVoices() and filter for voices where voice.quality == .premium (added iOS 17) or .enhanced. Then assign utterance.voice = AVSpeechSynthesisVoice(identifier: voice.identifier) using the exact identifier string (e.g. "com.apple.voice.premium.en-US.Zoe"). Note that premium voices must be downloaded by the user in Settings → Accessibility → Spoken Content → Voices before they're available.
What's the UIKit equivalent?
AVSpeechSynthesizer is a pure AVFoundation API with no UIKit dependency — it works identically in UIKit and SwiftUI. In a UIKit app you'd typically own the synthesizer in a UIViewController or a singleton service class, and implement the same AVSpeechSynthesizerDelegate methods to drive UI updates.
Last reviewed: 2026-05-11 by the Soarias team.