How to Build Speech Recognition in SwiftUI
Create an SFSpeechAudioBufferRecognitionRequest, pipe microphone buffers from
AVAudioEngine into it, and read
result.bestTranscription.formattedString for a live transcript.
Wrap the logic in an @Observable class so SwiftUI re-renders automatically.
import Speech
// 1. Request permissions (call once from .task { })
let speechOK = await withCheckedContinuation { cont in
SFSpeechRecognizer.requestAuthorization { cont.resume(returning: $0) }
}
let micOK = await AVAudioApplication.requestRecordPermission()
// 2. Create request with partial results
let request = SFSpeechAudioBufferRecognitionRequest()
request.shouldReportPartialResults = true
// 3. Start recognition and stream results
let task = recognizer.recognitionTask(with: request) { result, _ in
transcript = result?.bestTranscription.formattedString ?? transcript
}
// 4. Feed the mic into the request
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024,
format: audioEngine.inputNode.outputFormat(forBus: 0)) { buf, _ in
request.append(buf)
}
Full implementation
The cleanest architecture for speech recognition in SwiftUI is an @Observable
class—new in iOS 17—that owns the audio engine, recognition request, and task lifecycle. The SwiftUI view simply observes
transcript and isRecording
and drives a toggle button. Remember to add NSMicrophoneUsageDescription and
NSSpeechRecognitionUsageDescription to your Info.plist before running.
import SwiftUI
import Speech
import AVFoundation
// MARK: - Observable model
@Observable
@MainActor
final class SpeechRecognizer {
var transcript: String = ""
var isRecording: Bool = false
var permissionsGranted: Bool = false
var errorMessage: String?
private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
private let audioEngine = AVAudioEngine()
func requestPermissions() async {
let speechStatus = await withCheckedContinuation { cont in
SFSpeechRecognizer.requestAuthorization { cont.resume(returning: $0) }
}
let micGranted = await AVAudioApplication.requestRecordPermission()
permissionsGranted = (speechStatus == .authorized) && micGranted
}
func startRecording() {
guard !audioEngine.isRunning else { return }
recognitionTask?.cancel()
recognitionTask = nil
transcript = ""
do {
let session = AVAudioSession.sharedInstance()
try session.setCategory(.record, mode: .measurement, options: .duckOthers)
try session.setActive(true, options: .notifyOthersOnDeactivation)
} catch {
errorMessage = "Audio session error: \(error.localizedDescription)"
return
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let request = recognitionRequest else { return }
request.shouldReportPartialResults = true
request.requiresOnDeviceRecognition = false // flip to true for offline
recognitionTask = speechRecognizer.recognitionTask(with: request) { [weak self] result, error in
guard let self else { return }
if let result { self.transcript = result.bestTranscription.formattedString }
if error != nil || result?.isFinal == true { self.stopRecording() }
}
let inputNode = audioEngine.inputNode
let fmt = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: fmt) { [weak self] buf, _ in
self?.recognitionRequest?.append(buf)
}
audioEngine.prepare()
do { try audioEngine.start() } catch {
errorMessage = "Engine start error: \(error.localizedDescription)"
return
}
isRecording = true
}
func stopRecording() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest?.endAudio()
recognitionRequest = nil
recognitionTask?.cancel()
recognitionTask = nil
isRecording = false
}
}
// MARK: - SwiftUI View
struct SpeechRecognitionView: View {
@State private var recognizer = SpeechRecognizer()
var body: some View {
VStack(spacing: 24) {
ScrollView {
Text(recognizer.transcript.isEmpty ? "Tap the mic and start speaking…" : recognizer.transcript)
.font(.title3)
.foregroundStyle(recognizer.transcript.isEmpty ? .secondary : .primary)
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
.animation(.easeInOut, value: recognizer.transcript)
}
.frame(maxHeight: 280)
.background(.quaternary, in: RoundedRectangle(cornerRadius: 16))
Button {
recognizer.isRecording ? recognizer.stopRecording() : recognizer.startRecording()
} label: {
Image(systemName: recognizer.isRecording ? "mic.fill" : "mic")
.font(.system(size: 34))
.foregroundStyle(recognizer.isRecording ? .red : .accentColor)
.symbolEffect(.pulse, isActive: recognizer.isRecording)
.frame(width: 80, height: 80)
.background(.quaternary, in: Circle())
}
.accessibilityLabel(recognizer.isRecording ? "Stop recording" : "Start recording")
.disabled(!recognizer.permissionsGranted)
if let msg = recognizer.errorMessage {
Text(msg).font(.caption).foregroundStyle(.red)
}
}
.padding()
.navigationTitle("Speech Recognition")
.task { await recognizer.requestPermissions() }
}
}
#Preview {
NavigationStack { SpeechRecognitionView() }
}
How it works
-
Permission gate (
requestPermissions()). Speech recognition requires two separate authorizations:SFSpeechRecognizer.requestAuthorizationuses a completion-handler API bridged to async viawithCheckedContinuation, andAVAudioApplication.requestRecordPermission()(iOS 17 replacement for the old session call) is already async. Both must return success before the mic button is enabled. -
Audio session setup inside
startRecording(). Setting the session category to.recordwith.duckOtherslowers background audio (music, podcasts) while transcription is active and automatically restores it on stop. -
Buffer tap → recognition request pipeline.
inputNode.installTapfires a closure for every 1024-sample audio buffer captured by the microphone. Each buffer is appended directly toSFSpeechAudioBufferRecognitionRequest, which streams it to Apple's speech servers (or the on-device model whenrequiresOnDeviceRecognition = true). -
Partial results and final detection.
With
shouldReportPartialResults = true, the task closure fires after each recognized phrase fragment, updatingtranscript. Whenresult.isFinal == trueor an error occurs the session cleans itself up viastopRecording(). -
@Observable+@MainActor. MarkingSpeechRecognizerwith both attributes means all property mutations—including those arriving from the recognition task closure—are safely published on the main actor, preventing data races and keeping SwiftUI updates on the main thread without manualDispatchQueue.main.asynccalls.
Variants
On-device (offline) recognition
Set requiresOnDeviceRecognition = true before starting the task.
iOS downloads the on-device model automatically (A12 Bionic or later required). No network, no server round-trips—ideal for
privacy-sensitive apps. Accuracy is slightly lower than the server model for uncommon vocabulary.
request.requiresOnDeviceRecognition = true
// Optionally constrain vocabulary to improve accuracy:
if #available(iOS 17, *) {
request.customizedLanguageModel = .init(
fullVocabulary: false
)
// Provide domain-specific phrases the engine should prefer
let phrases = ["Soarias", "SwiftUI", "Xcode", "TestFlight"]
try await SFSpeechLanguageModel.prepareCustomLanguageModel(
for: request,
clientIdentifier: "com.example.myapp",
configuration: .init(
languageModel: .init(phrases: phrases)
)
)
}
Multi-language / locale picker
Instantiate a different SFSpeechRecognizer per locale and check
SFSpeechRecognizer.supportedLocales() at runtime to build a language
picker. Swap the recognizer inside startRecording() before creating the task—no other changes needed.
Always call stopRecording() before switching locales to avoid engine conflicts.
Common pitfalls
-
Missing Info.plist keys. You need both
NSMicrophoneUsageDescriptionandNSSpeechRecognitionUsageDescriptionin your target's Info.plist. Missing either causes a silent crash on the permission call—not a visible alert. -
Forgetting to remove the audio tap. Calling
audioEngine.stop()without first callinginputNode.removeTap(onBus: 0)throws an exception the next time you try to start recording. Always remove the tap instopRecording(). -
Recognition task closure arrives off main thread. Apple calls the task result closure on an internal background
queue. If you're not using
@MainActoron your model class, updating@Publishedor@Observableproperties directly will cause runtime warnings and potential UI glitches. - 1-minute server-side limit. Apple's cloud recognizer automatically ends sessions after approximately 60 seconds of audio. For long recordings, detect the final result, restart the session, and concatenate transcripts—don't rely on a single task for unbounded dictation.
-
Accessibility label on the mic button. A pulsing
mic.fillicon conveys state visually but nothing to VoiceOver. Always provide.accessibilityLabel("Stop recording")/"Start recording"dynamically based on recording state.
Prompt this with Claude Code
When using Soarias or Claude Code directly to implement this:
Implement speech recognition in SwiftUI for iOS 17+. Use SFSpeechRecognizer, SFSpeechAudioBufferRecognitionRequest, and AVAudioEngine. Wrap all state in an @Observable @MainActor class. Handle permissions for both microphone and speech recognition. Support optional on-device recognition via requiresOnDeviceRecognition. Make it accessible (VoiceOver labels for mic button, dynamic state). Add a #Preview with realistic sample data showing a partial transcript.
In Soarias's Build phase, paste this prompt into the Implementation panel—Claude Code will generate the
SpeechRecognizer model and SwiftUI view, then Soarias automatically
wires up the Info.plist entries and runs a simulator build to catch permission-key errors before you touch a real device.
Related
FAQ
Does this work on iOS 16?
SFSpeechRecognizer itself goes back to iOS 10, but this implementation
uses three iOS 17-only features: the @Observable macro,
AVAudioApplication.requestRecordPermission(), and the
#Preview macro. To backport to iOS 16, swap
@Observable for ObservableObject,
use AVAudioSession.sharedInstance().requestRecordPermission, and switch
to PreviewProvider.
How do I get punctuation in the transcript?
Set addsPunctuation = true on your
SFSpeechAudioBufferRecognitionRequest (available since iOS 16).
The server model will insert commas, periods, and question marks automatically based on prosody. This property has no effect
when requiresOnDeviceRecognition = true on older device models.
What's the UIKit equivalent?
The underlying Speech framework is UIKit-agnostic—use the exact same
SFSpeechRecognizer / AVAudioEngine
stack from a UIViewController. The only difference is that you update a
UILabel directly instead of writing to an
@Observable property. For a dictation UX that closely mirrors the iOS
keyboard, consider UITextField with
dictationRecordingDidEnd notifications instead.
Last reviewed: 2026-05-12 by the Soarias team.