How to Build Speech Recognition in SwiftUI

iOS 17+ Xcode 16+ Advanced APIs: Speech, AVFoundation Updated: May 12, 2026
TL;DR

Create an SFSpeechAudioBufferRecognitionRequest, pipe microphone buffers from AVAudioEngine into it, and read result.bestTranscription.formattedString for a live transcript. Wrap the logic in an @Observable class so SwiftUI re-renders automatically.

import Speech

// 1. Request permissions (call once from .task { })
let speechOK = await withCheckedContinuation { cont in
    SFSpeechRecognizer.requestAuthorization { cont.resume(returning: $0) }
}
let micOK = await AVAudioApplication.requestRecordPermission()

// 2. Create request with partial results
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true

// 3. Start recognition and stream results
let task = recognizer.recognitionTask(with: request) { result, _ in
    transcript = result?.bestTranscription.formattedString ?? transcript
}

// 4. Feed the mic into the request
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024,
    format: audioEngine.inputNode.outputFormat(forBus: 0)) { buf, _ in
    request.append(buf)
}

Full implementation

The cleanest architecture for speech recognition in SwiftUI is an @Observable class—new in iOS 17—that owns the audio engine, recognition request, and task lifecycle. The SwiftUI view simply observes transcript and isRecording and drives a toggle button. Remember to add NSMicrophoneUsageDescription and NSSpeechRecognitionUsageDescription to your Info.plist before running.

import SwiftUI
import Speech
import AVFoundation

// MARK: - Observable model

@Observable
@MainActor
final class SpeechRecognizer {
    var transcript: String = ""
    var isRecording: Bool = false
    var permissionsGranted: Bool = false
    var errorMessage: String?

    private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
    private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
    private var recognitionTask: SFSpeechRecognitionTask?
    private let audioEngine = AVAudioEngine()

    func requestPermissions() async {
        let speechStatus = await withCheckedContinuation { cont in
            SFSpeechRecognizer.requestAuthorization { cont.resume(returning: $0) }
        }
        let micGranted = await AVAudioApplication.requestRecordPermission()
        permissionsGranted = (speechStatus == .authorized) && micGranted
    }

    func startRecording() {
        guard !audioEngine.isRunning else { return }
        recognitionTask?.cancel()
        recognitionTask = nil
        transcript = ""

        do {
            let session = AVAudioSession.sharedInstance()
            try session.setCategory(.record, mode: .measurement, options: .duckOthers)
            try session.setActive(true, options: .notifyOthersOnDeactivation)
        } catch {
            errorMessage = "Audio session error: \(error.localizedDescription)"
            return
        }

        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        guard let request = recognitionRequest else { return }
        request.shouldReportPartialResults = true
        request.requiresOnDeviceRecognition = false   // flip to true for offline

        recognitionTask = speechRecognizer.recognitionTask(with: request) { [weak self] result, error in
            guard let self else { return }
            if let result { self.transcript = result.bestTranscription.formattedString }
            if error != nil || result?.isFinal == true { self.stopRecording() }
        }

        let inputNode = audioEngine.inputNode
        let fmt = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: fmt) { [weak self] buf, _ in
            self?.recognitionRequest?.append(buf)
        }

        audioEngine.prepare()
        do { try audioEngine.start() } catch {
            errorMessage = "Engine start error: \(error.localizedDescription)"
            return
        }
        isRecording = true
    }

    func stopRecording() {
        audioEngine.stop()
        audioEngine.inputNode.removeTap(onBus: 0)
        recognitionRequest?.endAudio()
        recognitionRequest = nil
        recognitionTask?.cancel()
        recognitionTask = nil
        isRecording = false
    }
}

// MARK: - SwiftUI View

struct SpeechRecognitionView: View {
    @State private var recognizer = SpeechRecognizer()

    var body: some View {
        VStack(spacing: 24) {
            ScrollView {
                Text(recognizer.transcript.isEmpty ? "Tap the mic and start speaking…" : recognizer.transcript)
                    .font(.title3)
                    .foregroundStyle(recognizer.transcript.isEmpty ? .secondary : .primary)
                    .frame(maxWidth: .infinity, alignment: .leading)
                    .padding()
                    .animation(.easeInOut, value: recognizer.transcript)
            }
            .frame(maxHeight: 280)
            .background(.quaternary, in: RoundedRectangle(cornerRadius: 16))

            Button {
                recognizer.isRecording ? recognizer.stopRecording() : recognizer.startRecording()
            } label: {
                Image(systemName: recognizer.isRecording ? "mic.fill" : "mic")
                    .font(.system(size: 34))
                    .foregroundStyle(recognizer.isRecording ? .red : .accentColor)
                    .symbolEffect(.pulse, isActive: recognizer.isRecording)
                    .frame(width: 80, height: 80)
                    .background(.quaternary, in: Circle())
            }
            .accessibilityLabel(recognizer.isRecording ? "Stop recording" : "Start recording")
            .disabled(!recognizer.permissionsGranted)

            if let msg = recognizer.errorMessage {
                Text(msg).font(.caption).foregroundStyle(.red)
            }
        }
        .padding()
        .navigationTitle("Speech Recognition")
        .task { await recognizer.requestPermissions() }
    }
}

#Preview {
    NavigationStack { SpeechRecognitionView() }
}

How it works

  1. Permission gate (requestPermissions()). Speech recognition requires two separate authorizations: SFSpeechRecognizer.requestAuthorization uses a completion-handler API bridged to async via withCheckedContinuation, and AVAudioApplication.requestRecordPermission() (iOS 17 replacement for the old session call) is already async. Both must return success before the mic button is enabled.
  2. Audio session setup inside startRecording(). Setting the session category to .record with .duckOthers lowers background audio (music, podcasts) while transcription is active and automatically restores it on stop.
  3. Buffer tap → recognition request pipeline. inputNode.installTap fires a closure for every 1024-sample audio buffer captured by the microphone. Each buffer is appended directly to SFSpeechAudioBufferRecognitionRequest, which streams it to Apple's speech servers (or the on-device model when requiresOnDeviceRecognition = true).
  4. Partial results and final detection. With shouldReportPartialResults = true, the task closure fires after each recognized phrase fragment, updating transcript. When result.isFinal == true or an error occurs the session cleans itself up via stopRecording().
  5. @Observable + @MainActor. Marking SpeechRecognizer with both attributes means all property mutations—including those arriving from the recognition task closure—are safely published on the main actor, preventing data races and keeping SwiftUI updates on the main thread without manual DispatchQueue.main.async calls.

Variants

On-device (offline) recognition

Set requiresOnDeviceRecognition = true before starting the task. iOS downloads the on-device model automatically (A12 Bionic or later required). No network, no server round-trips—ideal for privacy-sensitive apps. Accuracy is slightly lower than the server model for uncommon vocabulary.

request.requiresOnDeviceRecognition = true

// Optionally constrain vocabulary to improve accuracy:
if #available(iOS 17, *) {
    request.customizedLanguageModel = .init(
        fullVocabulary: false
    )
    // Provide domain-specific phrases the engine should prefer
    let phrases = ["Soarias", "SwiftUI", "Xcode", "TestFlight"]
    try await SFSpeechLanguageModel.prepareCustomLanguageModel(
        for: request,
        clientIdentifier: "com.example.myapp",
        configuration: .init(
            languageModel: .init(phrases: phrases)
        )
    )
}

Multi-language / locale picker

Instantiate a different SFSpeechRecognizer per locale and check SFSpeechRecognizer.supportedLocales() at runtime to build a language picker. Swap the recognizer inside startRecording() before creating the task—no other changes needed. Always call stopRecording() before switching locales to avoid engine conflicts.

Common pitfalls

Prompt this with Claude Code

When using Soarias or Claude Code directly to implement this:

Implement speech recognition in SwiftUI for iOS 17+.
Use SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest, and AVAudioEngine.
Wrap all state in an @Observable @MainActor class.
Handle permissions for both microphone and speech recognition.
Support optional on-device recognition via requiresOnDeviceRecognition.
Make it accessible (VoiceOver labels for mic button, dynamic state).
Add a #Preview with realistic sample data showing a partial transcript.

In Soarias's Build phase, paste this prompt into the Implementation panel—Claude Code will generate the SpeechRecognizer model and SwiftUI view, then Soarias automatically wires up the Info.plist entries and runs a simulator build to catch permission-key errors before you touch a real device.

Related

FAQ

Does this work on iOS 16?

SFSpeechRecognizer itself goes back to iOS 10, but this implementation uses three iOS 17-only features: the @Observable macro, AVAudioApplication.requestRecordPermission(), and the #Preview macro. To backport to iOS 16, swap @Observable for ObservableObject, use AVAudioSession.sharedInstance().requestRecordPermission, and switch to PreviewProvider.

How do I get punctuation in the transcript?

Set addsPunctuation = true on your SFSpeechAudioBufferRecognitionRequest (available since iOS 16). The server model will insert commas, periods, and question marks automatically based on prosody. This property has no effect when requiresOnDeviceRecognition = true on older device models.

What's the UIKit equivalent?

The underlying Speech framework is UIKit-agnostic—use the exact same SFSpeechRecognizer / AVAudioEngine stack from a UIViewController. The only difference is that you update a UILabel directly instead of writing to an @Observable property. For a dictation UX that closely mirrors the iOS keyboard, consider UITextField with dictationRecordingDidEnd notifications instead.

Last reviewed: 2026-05-12 by the Soarias team.