How to implement text to speech in SwiftUI

iOS 17+ Xcode 16+ Intermediate APIs: AVSpeechSynthesizer Updated: May 11, 2026
TL;DR

Create an AVSpeechSynthesizer, wrap your text in an AVSpeechUtterance, then call synthesizer.speak(_:). Store the synthesizer as a long-lived object so it isn't deallocated mid-speech.

import AVFoundation

struct TTSView: View {
    private let synthesizer = AVSpeechSynthesizer()
    @State private var text = "Hello from SwiftUI!"

    var body: some View {
        VStack(spacing: 16) {
            TextField("Enter text", text: $text)
                .textFieldStyle(.roundedBorder)
            Button("Speak") {
                let utterance = AVSpeechUtterance(string: text)
                utterance.rate = AVSpeechUtteranceDefaultSpeechRate
                synthesizer.speak(utterance)
            }
        }
        .padding()
    }
}

Full implementation

The complete solution wraps AVSpeechSynthesizer inside an @Observable class so SwiftUI can react to speaking state changes. We conform to AVSpeechSynthesizerDelegate to track when speech starts, pauses, and finishes — letting us drive UI like a progress indicator or a dynamic play/pause button. Rate, pitch, and voice language are all exposed as adjustable properties.

import SwiftUI
import AVFoundation

// MARK: - ViewModel

@Observable
final class SpeechViewModel: NSObject, AVSpeechSynthesizerDelegate {
    private let synthesizer = AVSpeechSynthesizer()

    var isSpeaking = false
    var isPaused  = false
    var rate: Float = AVSpeechUtteranceDefaultSpeechRate   // 0.0 – 1.0
    var pitch: Float = 1.0                                  // 0.5 – 2.0
    var languageCode = "en-US"

    override init() {
        super.init()
        synthesizer.delegate = self
    }

    func speak(_ text: String) {
        guard !text.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty else { return }
        if synthesizer.isSpeaking { synthesizer.stopSpeaking(at: .immediate) }

        let utterance = AVSpeechUtterance(string: text)
        utterance.rate            = rate
        utterance.pitchMultiplier = pitch
        utterance.volume          = 1.0
        utterance.voice           = AVSpeechSynthesisVoice(language: languageCode)
        synthesizer.speak(utterance)
    }

    func togglePause() {
        if synthesizer.isPaused {
            synthesizer.continueSpeaking()
        } else {
            synthesizer.pauseSpeaking(at: .word)
        }
    }

    func stop() {
        synthesizer.stopSpeaking(at: .immediate)
    }

    // MARK: AVSpeechSynthesizerDelegate
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           didStart utterance: AVSpeechUtterance) {
        isSpeaking = true
        isPaused   = false
    }

    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           didPause utterance: AVSpeechUtterance) {
        isPaused = true
    }

    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           didContinue utterance: AVSpeechUtterance) {
        isPaused = false
    }

    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                           didFinish utterance: AVSpeechUtterance) {
        isSpeaking = false
        isPaused   = false
    }
}

// MARK: - View

struct TextToSpeechView: View {
    @State private var vm = SpeechViewModel()
    @State private var inputText = "SwiftUI makes text to speech surprisingly approachable."

    var body: some View {
        NavigationStack {
            Form {
                Section("Text") {
                    TextEditor(text: $inputText)
                        .frame(minHeight: 100)
                        .accessibilityLabel("Text to speak")
                }

                Section("Voice settings") {
                    LabeledContent("Rate") {
                        Slider(value: $vm.rate, in: AVSpeechUtteranceMinimumSpeechRate...AVSpeechUtteranceMaximumSpeechRate)
                            .accessibilityLabel("Speech rate")
                    }
                    LabeledContent("Pitch") {
                        Slider(value: $vm.pitch, in: 0.5...2.0)
                            .accessibilityLabel("Pitch multiplier")
                    }
                    Picker("Language", selection: $vm.languageCode) {
                        Text("English (US)").tag("en-US")
                        Text("English (GB)").tag("en-GB")
                        Text("Spanish").tag("es-ES")
                        Text("French").tag("fr-FR")
                        Text("Japanese").tag("ja-JP")
                    }
                }

                Section {
                    HStack(spacing: 16) {
                        Button {
                            vm.speak(inputText)
                        } label: {
                            Label("Speak", systemImage: "play.fill")
                                .frame(maxWidth: .infinity)
                        }
                        .buttonStyle(.borderedProminent)
                        .disabled(vm.isSpeaking && !vm.isPaused)

                        Button {
                            vm.togglePause()
                        } label: {
                            Label(vm.isPaused ? "Resume" : "Pause",
                                  systemImage: vm.isPaused ? "play.circle" : "pause.circle")
                                .frame(maxWidth: .infinity)
                        }
                        .buttonStyle(.bordered)
                        .disabled(!vm.isSpeaking)

                        Button(role: .destructive) {
                            vm.stop()
                        } label: {
                            Label("Stop", systemImage: "stop.fill")
                                .frame(maxWidth: .infinity)
                        }
                        .buttonStyle(.bordered)
                        .disabled(!vm.isSpeaking)
                    }
                }
            }
            .navigationTitle("Text to Speech")
            .overlay(alignment: .top) {
                if vm.isSpeaking {
                    Label(vm.isPaused ? "Paused" : "Speaking…",
                          systemImage: vm.isPaused ? "pause.circle.fill" : "waveform")
                        .font(.footnote)
                        .padding(.horizontal, 12)
                        .padding(.vertical, 6)
                        .background(.thinMaterial, in: Capsule())
                        .padding(.top, 8)
                        .transition(.move(edge: .top).combined(with: .opacity))
                        .animation(.spring, value: vm.isSpeaking)
                }
            }
        }
    }
}

#Preview {
    TextToSpeechView()
}

How it works

  1. @Observable SpeechViewModel — Marking the class @Observable (iOS 17 Observation framework) means any view reading isSpeaking or isPaused automatically re-renders when those properties change, without manually publishing each one.
  2. AVSpeechSynthesizerDelegate — The didStart, didPause, didContinue, and didFinish callbacks are the only reliable way to track speech state. Setting synthesizer.delegate = self in init() wires this up.
  3. AVSpeechUtterance configuration — Each call to speak(_:) creates a fresh utterance. Rate uses the system constant AVSpeechUtteranceDefaultSpeechRate (0.5) as a baseline; pitchMultiplier defaults to 1.0 and accepts values from 0.5 (lower) to 2.0 (higher).
  4. AVSpeechSynthesisVoice(language:) — Passing a BCP-47 language tag like "en-US" lets the system pick the best available voice for that locale. On-device enhanced voices download automatically on iOS 17+.
  5. Status pill overlay — The .overlay(alignment: .top) combined with .transition and .animation(.spring, value: vm.isSpeaking) gives non-intrusive feedback that disappears the moment speech ends.

Variants

Highlight spoken words in real time

Use the willSpeakRangeOfSpeechString delegate method to get the character range of the word currently being spoken, then highlight it in a SwiftUI Text view using AttributedString.

// Inside SpeechViewModel — add a published range
var highlightedRange: Range<String.Index>? = nil

func speechSynthesizer(
    _ synthesizer: AVSpeechSynthesizer,
    willSpeakRangeOfSpeechString characterRange: NSRange,
    utterance: AVSpeechUtterance
) {
    let str = utterance.speechString
    guard let range = Range(characterRange, in: str) else { return }
    highlightedRange = range
}

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
                       didFinish utterance: AVSpeechUtterance) {
    isSpeaking      = false
    isPaused        = false
    highlightedRange = nil
}

// In your View — build a highlighted AttributedString
func attributed(_ source: String) -> AttributedString {
    var att = AttributedString(source)
    if let range = vm.highlightedRange,
       let attRange = Range(range, in: att) {
        att[attRange].backgroundColor = .yellow
        att[attRange].foregroundColor = .black
    }
    return att
}

// Render it
Text(attributed(inputText))
    .animation(.linear(duration: 0.05), value: vm.highlightedRange?.lowerBound)

List all available voices on device

Call AVSpeechSynthesisVoice.speechVoices() to enumerate every installed voice. Filter by voice.quality == .enhanced or .premium (iOS 17 added .premium) to surface the highest-quality options. Use voice.identifier instead of a language code when you want a specific named voice like Siri Voice 2.

Common pitfalls

Prompt this with Claude Code

When using Soarias or Claude Code directly to implement this:

Implement text to speech in SwiftUI for iOS 17+.
Use AVSpeechSynthesizer, AVSpeechUtterance, and AVSpeechSynthesisVoice.
Wrap the synthesizer in an @Observable ViewModel that exposes
isSpeaking and isPaused state, and conforms to AVSpeechSynthesizerDelegate.
Include rate and pitch Slider controls and a language Picker.
Make it accessible (VoiceOver labels on all controls).
Add a #Preview with realistic sample data.

Drop this prompt into the Soarias Build phase after your screens are scaffolded — it will generate the ViewModel, wire up the delegate, and keep your SwiftUI layer thin.

Related

FAQ

Does this work on iOS 16?

The core AVSpeechSynthesizer APIs have been available since iOS 7. However, the @Observable macro used in the ViewModel requires iOS 17. For iOS 16 support, swap @Observable for ObservableObject with @Published properties and use @StateObject in the view. The synthesizer itself will work identically.

How do I use a specific high-quality Siri voice instead of a generic locale voice?

Call AVSpeechSynthesisVoice.speechVoices() and filter for voices where voice.quality == .premium (added iOS 17) or .enhanced. Then assign utterance.voice = AVSpeechSynthesisVoice(identifier: voice.identifier) using the exact identifier string (e.g. "com.apple.voice.premium.en-US.Zoe"). Note that premium voices must be downloaded by the user in Settings → Accessibility → Spoken Content → Voices before they're available.

What's the UIKit equivalent?

AVSpeechSynthesizer is a pure AVFoundation API with no UIKit dependency — it works identically in UIKit and SwiftUI. In a UIKit app you'd typically own the synthesizer in a UIViewController or a singleton service class, and implement the same AVSpeechSynthesizerDelegate methods to drive UI updates.

Last reviewed: 2026-05-11 by the Soarias team.