How to Build a Voice Memo App in SwiftUI
A Voice Memo app lets users record audio, automatically transcribe it on-device using Apple's Speech framework, and search through their recordings by text. It's ideal for journalists, students, and anyone who needs a private, offline-first audio journal.
Prerequisites
- Mac with Xcode 16+
- Apple Developer Program ($99/year) — required for TestFlight and App Store
- Basic Swift/SwiftUI knowledge
- A physical iPhone for testing — AVAudioSession and SFSpeechRecognizer behave differently on the Simulator and microphone access requires real hardware for reliable results
- Familiarity with async/await — the Speech framework's modern API is fully async
Architecture overview
The app follows a single-store pattern: SwiftData owns the persistent VoiceMemo model objects and the SwiftUI views observe them via @Query. Two @Observable service classes live outside the model layer — AudioRecorder wraps AVFoundation and writes .m4a files to the app's documents directory, while TranscriptionService wraps SFSpeechRecognizer and streams partial results back into the model. Playback is handled by a lightweight AudioPlayer observable that loads the file URL stored on the memo. There are no remote calls; everything runs on-device.
VoiceMemoApp/ ├── App/ │ └── VoiceMemoApp.swift # @main, modelContainer setup ├── Models/ │ └── VoiceMemo.swift # @Model — id, title, fileURL, transcript, duration, createdAt ├── Services/ │ ├── AudioRecorder.swift # @Observable — AVAudioSession + AVAudioRecorder │ ├── TranscriptionService.swift # @Observable — SFSpeechRecognizer async stream │ └── AudioPlayer.swift # @Observable — AVAudioPlayer + progress timer ├── Views/ │ ├── MemoListView.swift # @Query list, record button │ ├── MemoRowView.swift # single row with waveform badge │ ├── RecordingOverlayView.swift # sheet shown while recording │ └── MemoDetailView.swift # full transcript + playback controls ├── Paywall/ │ └── SubscriptionView.swift # StoreKit 2 paywall └── PrivacyInfo.xcprivacy
Step-by-step
1. Project setup and permissions
Create a new Xcode project using the iOS App template with SwiftUI and SwiftData. You need three Info.plist keys before any AVFoundation or Speech API will work — missing even one will crash on the first permission request.
<!-- Info.plist additions -->
<key>NSMicrophoneUsageDescription</key>
<string>Voice Memo records audio using your microphone.</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>Voice Memo transcribes your recordings on-device.</string>
<!-- Required for background audio finishing (optional but recommended) -->
<key>UIBackgroundModes</key>
<array>
<string>audio</string>
</array>
// VoiceMemoApp.swift
import SwiftUI
import SwiftData
@main
struct VoiceMemoApp: App {
var body: some Scene {
WindowGroup {
MemoListView()
}
.modelContainer(for: VoiceMemo.self)
}
}
2. Data model with SwiftData
The VoiceMemo model stores everything needed to reconstruct the memo — the file URL as a string (SwiftData can't store URL directly), the transcript, and duration so the UI can display it without loading the audio file.
// Models/VoiceMemo.swift
import Foundation
import SwiftData
@Model
final class VoiceMemo {
var id: UUID
var title: String
var fileURLString: String // Store as String; URL isn't Codable in SwiftData
var transcript: String
var duration: TimeInterval
var createdAt: Date
var isTranscribing: Bool
var fileURL: URL? {
URL(string: fileURLString)
}
init(title: String, fileURLString: String, duration: TimeInterval = 0) {
self.id = UUID()
self.title = title
self.fileURLString = fileURLString
self.transcript = ""
self.duration = duration
self.createdAt = Date()
self.isTranscribing = false
}
}
extension VoiceMemo {
/// Human-readable duration string, e.g. "1:24"
var durationString: String {
let minutes = Int(duration) / 60
let seconds = Int(duration) % 60
return String(format: "%d:%02d", minutes, seconds)
}
}
3. Core UI — memo list view
The list view uses @Query to subscribe to all memos sorted by creation date. A persistent floating button triggers the recording sheet. Swipe-to-delete removes the model object and the audio file together.
// Views/MemoListView.swift
import SwiftUI
import SwiftData
struct MemoListView: View {
@Query(sort: \VoiceMemo.createdAt, order: .reverse)
private var memos: [VoiceMemo]
@Environment(\.modelContext) private var context
@State private var isRecording = false
@State private var recorder = AudioRecorder()
var body: some View {
NavigationStack {
List {
ForEach(memos) { memo in
NavigationLink(value: memo) {
MemoRowView(memo: memo)
}
}
.onDelete(perform: deleteMemos)
}
.navigationTitle("Voice Memos")
.navigationDestination(for: VoiceMemo.self) { memo in
MemoDetailView(memo: memo)
}
.overlay(alignment: .bottom) {
RecordButton(isRecording: $isRecording)
.padding(.bottom, 32)
}
}
.sheet(isPresented: $isRecording) {
RecordingOverlayView(recorder: recorder) { savedMemo in
context.insert(savedMemo)
}
}
}
private func deleteMemos(at offsets: IndexSet) {
for index in offsets {
let memo = memos[index]
if let url = memo.fileURL {
try? FileManager.default.removeItem(at: url)
}
context.delete(memo)
}
}
}
struct RecordButton: View {
@Binding var isRecording: Bool
var body: some View {
Button {
isRecording = true
} label: {
Image(systemName: "mic.circle.fill")
.font(.system(size: 64))
.foregroundStyle(.red)
.shadow(radius: 8)
}
}
}
#Preview {
MemoListView()
.modelContainer(for: VoiceMemo.self, inMemory: true)
}
4. Audio recording with AVFoundation
AudioRecorder is an @Observable class that activates the audio session, starts the recorder, and returns the saved file URL when stopped. Using .playAndRecord with .defaultToSpeaker ensures the session is compatible with simultaneous playback in other app states.
// Services/AudioRecorder.swift
import AVFoundation
import Observation
@Observable
final class AudioRecorder: NSObject {
var isRecording = false
var currentLevel: Float = 0 // –160…0 dB, used for waveform UI
var elapsedTime: TimeInterval = 0
private var recorder: AVAudioRecorder?
private var timer: Timer?
private var startDate: Date?
func start() throws -> URL {
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playAndRecord, options: [.defaultToSpeaker, .allowBluetooth])
try session.setActive(true)
let url = documentsURL().appendingPathComponent("\(UUID().uuidString).m4a")
let settings: [String: Any] = [
AVFormatIDKey: Int(kAudioFormatMPEG4AAC),
AVSampleRateKey: 44_100,
AVNumberOfChannelsKey: 1,
AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
]
recorder = try AVAudioRecorder(url: url, settings: settings)
recorder?.isMeteringEnabled = true
recorder?.record()
startDate = Date()
isRecording = true
timer = Timer.scheduledTimer(withTimeInterval: 0.1, repeats: true) { [weak self] _ in
self?.tick()
}
return url
}
func stop() -> TimeInterval {
timer?.invalidate(); timer = nil
recorder?.stop()
isRecording = false
let duration = Date().timeIntervalSince(startDate ?? Date())
try? AVAudioSession.sharedInstance().setActive(false)
return duration
}
private func tick() {
recorder?.updateMeters()
currentLevel = recorder?.averagePower(forChannel: 0) ?? -160
elapsedTime = Date().timeIntervalSince(startDate ?? Date())
}
private func documentsURL() -> URL {
FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
}
}
5. On-device speech transcription
TranscriptionService uses SFSpeechRecognizer with a file-based request so the audio never leaves the device. The requiresOnDeviceRecognition flag enforces this — Apple's on-device models support English, Spanish, French, German, and several other locales as of iOS 17.
// Services/TranscriptionService.swift
import Speech
import Observation
@Observable
final class TranscriptionService {
var isAuthorized = false
func requestAuthorization() async {
let status = await SFSpeechRecognizer.requestAuthorization() // runs on main actor
isAuthorized = (status == .authorized)
}
/// Transcribes the file at `url` and returns the best transcript string.
func transcribe(url: URL) async throws -> String {
guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
throw TranscriptionError.recognizerUnavailable
}
let request = SFSpeechURLRecognitionRequest(url: url)
request.requiresOnDeviceRecognition = true // no network call
request.shouldReportPartialResults = false
return try await withCheckedThrowingContinuation { continuation in
recognizer.recognitionTask(with: request) { result, error in
if let error {
continuation.resume(throwing: error)
return
}
guard let result, result.isFinal else { return }
continuation.resume(returning: result.bestTranscription.formattedString)
}
}
}
enum TranscriptionError: LocalizedError {
case recognizerUnavailable
var errorDescription: String? { "Speech recognizer is not available on this device." }
}
}
6. Playback and recording overlay
The recording overlay sheet ties together the recorder and transcription service — it starts recording on appear, shows a live level meter, and on stop kicks off async transcription before inserting the new memo into the model context.
// Views/RecordingOverlayView.swift
import SwiftUI
struct RecordingOverlayView: View {
let recorder: AudioRecorder
let onSave: (VoiceMemo) -> Void
@Environment(\.dismiss) private var dismiss
@State private var transcription = TranscriptionService()
@State private var savedURL: URL?
@State private var isTranscribing = false
@State private var errorMessage: String?
var body: some View {
VStack(spacing: 24) {
Spacer()
Text(recorder.isRecording ? timeString(recorder.elapsedTime) : "Done")
.font(.system(size: 48, weight: .thin, design: .monospaced))
// Live level meter
HStack(spacing: 3) {
ForEach(0..<20, id: \.self) { i in
let threshold = Float(-160 + i * 8)
RoundedRectangle(cornerRadius: 2)
.fill(recorder.currentLevel > threshold ? Color.red : Color.gray.opacity(0.3))
.frame(width: 8, height: 24)
}
}
if isTranscribing {
ProgressView("Transcribing…")
}
if let error = errorMessage {
Text(error).font(.caption).foregroundStyle(.red)
}
Button(role: .destructive) {
Task { await stopAndSave() }
} label: {
Label("Stop & Save", systemImage: "stop.circle.fill")
.font(.title3.bold())
}
.disabled(isTranscribing)
Spacer()
}
.padding()
.task { await startRecording() }
}
private func startRecording() async {
await transcription.requestAuthorization()
savedURL = try? recorder.start()
}
private func stopAndSave() async {
let duration = recorder.stop()
guard let url = savedURL else { dismiss(); return }
let formatter = DateFormatter()
formatter.dateStyle = .medium
formatter.timeStyle = .short
let memo = VoiceMemo(
title: "Memo \(formatter.string(from: Date()))",
fileURLString: url.absoluteString,
duration: duration
)
memo.isTranscribing = true
onSave(memo)
isTranscribing = true
do {
memo.transcript = try await transcription.transcribe(url: url)
} catch {
errorMessage = error.localizedDescription
}
memo.isTranscribing = false
isTranscribing = false
dismiss()
}
private func timeString(_ t: TimeInterval) -> String {
String(format: "%02d:%02d", Int(t) / 60, Int(t) % 60)
}
}
#Preview {
RecordingOverlayView(recorder: AudioRecorder()) { _ in }
}
7. Privacy Manifest — required for App Store
Apps using AVFoundation's audio APIs and file timestamps must include a PrivacyInfo.xcprivacy file. Missing it will trigger an App Store review rejection. Add the file to the app target (not a framework target) via File → New → Privacy Manifest.
<!-- PrivacyInfo.xcprivacy -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>NSPrivacyCollectedDataTypes</key>
<array/>
<key>NSPrivacyAccessedAPITypes</key>
<array>
<dict>
<key>NSPrivacyAccessedAPIType</key>
<string>NSPrivacyAccessedAPICategoryFileTimestamp</string>
<key>NSPrivacyAccessedAPITypeReasons</key>
<array>
<string>C617.1</string> <!-- File created by the app itself -->
</array>
</dict>
<dict>
<key>NSPrivacyAccessedAPIType</key>
<string>NSPrivacyAccessedAPICategoryUserDefaults</string>
<key>NSPrivacyAccessedAPITypeReasons</key>
<array>
<string>CA92.1</string> <!-- Read/write app's own defaults -->
</array>
</dict>
</array>
<key>NSPrivacyTracking</key>
<false/>
</dict>
</plist>
Common pitfalls
- AVAudioSession not deactivated after recording: If you forget to call
setActive(false)after stopping, subsequent recordings in the same session will silently fail or produce empty files. - Storing URL directly in SwiftData: SwiftData's
@Modelmacro cannot persistFoundation.URL— store theabsoluteStringand reconstruct withURL(string:). A force-unwrap here is a common crash source. requiresOnDeviceRecognitionon unsupported locales: On-device recognition is only available for certain locales. If the user's device locale isn't supported, the request will fail with an unhelpful error. CheckSFSpeechRecognizer.supportedLocales()and fall back gracefully.- App Store rejection for missing microphone justification: The App Store review team manually checks that your
NSMicrophoneUsageDescriptionstring clearly explains the user benefit — "this app uses the microphone" is routinely rejected. Be explicit: "Voice Memo uses the microphone to record audio notes." - Transcript not appearing after background transcription: SwiftData model updates from a background
Taskmust happen on the main actor. Wrapmemo.transcript =assignments inawait MainActor.run { … }if you refactor transcription into a detached task.
Adding monetization: Subscription
Voice Memo apps map well to a freemium subscription model: free users get a limited number of recordings (e.g., 10 total) or a cap on recording length, while subscribers unlock unlimited recordings, iCloud sync, and export to text. Implement this with StoreKit 2 — define a Product with type .autoRenewable in App Store Connect, then use Product.products(for: ["com.yourapp.pro_monthly"]) in a SubscriptionManager observable. Gate features by checking Transaction.currentEntitlements on app launch and in a .task modifier on gated views. The StoreKit Testing in Xcode (local StoreKit configuration file) lets you test subscription flows without a live App Store product.
Shipping this faster with Soarias
Soarias handles the tedious parts of the intermediate-complexity checklist: project scaffolding with the correct AVFoundation/Speech entitlements pre-wired, automatic generation of the PrivacyInfo.xcprivacy file with the right reason codes for file timestamp and UserDefaults access, fastlane lane configuration for TestFlight and App Store distribution, and uploading screenshots taken directly from your Simulator runs. That removes at minimum the project setup step, the entire Privacy Manifest step, and the manual ASC metadata entry from your week.
For an intermediate project like this Voice Memo app, most developers spend one to two days on Xcode project configuration, permission strings, fastlane setup, and App Store Connect metadata — work that has nothing to do with AVFoundation or SwiftUI. Soarias compresses that to under an hour, so your week-long build stays focused on the recording and transcription logic that actually makes the app yours.
Related guides
FAQ
Does this work on iOS 16?
The code as written requires iOS 17 because it uses the @Observable macro (introduced in iOS 17) and the #Preview macro. You can back-port by replacing @Observable with ObservableObject/@Published and #Preview with PreviewProvider, but SwiftData also requires iOS 17, so you'd also need to swap to Core Data for iOS 16 support.
Do I need a paid Apple Developer account to test?
No — you can sideload to your own device with a free Apple ID via Xcode. However, AVAudioSession and SFSpeechRecognizer both require real hardware (the Simulator won't request microphone access), so you do need a physical iPhone. A paid account ($99/year) is only required for TestFlight distribution and App Store submission.
How do I add this to the App Store?
Create an app record in App Store Connect, archive your build in Xcode (Product → Archive), upload via Organizer or xcrun altool, fill in the required metadata (screenshots, privacy labels, description), and submit for review. Plan for a 24–48 hour review window. The Privacy Manifest and microphone usage description are the two most common causes of rejection for audio apps — make sure both are in place before submitting.
How do I handle users whose device language doesn't support on-device transcription?
Check SFSpeechRecognizer(locale: Locale.current)?.supportsOnDeviceRecognition at runtime. If it returns false, you have two options: remove requiresOnDeviceRecognition = true to allow server-side recognition (which requires network access and Apple's servers), or show a localized message explaining that transcription isn't supported for the user's language and disable the feature gracefully. Never silently fail — users will assume it's a bug.
Last reviewed: 2026-05-11 by the Soarias team.