How to Build a Voice Memo App in SwiftUI

A Voice Memo app lets users record audio, automatically transcribe it on-device using Apple's Speech framework, and search through their recordings by text. It's ideal for journalists, students, and anyone who needs a private, offline-first audio journal.

iOS 17+ · Xcode 16+ · SwiftUI Complexity: Intermediate Estimated time: 1 week Updated: May 11, 2026

Prerequisites

Mac with Xcode 16+
Apple Developer Program ($99/year) — required for TestFlight and App Store
Basic Swift/SwiftUI knowledge
A physical iPhone for testing — AVAudioSession and SFSpeechRecognizer behave differently on the Simulator and microphone access requires real hardware for reliable results
Familiarity with async/await — the Speech framework's modern API is fully async

Architecture overview

The app follows a single-store pattern: SwiftData owns the persistent VoiceMemo model objects and the SwiftUI views observe them via @Query. Two @Observable service classes live outside the model layer — AudioRecorder wraps AVFoundation and writes .m4a files to the app's documents directory, while TranscriptionService wraps SFSpeechRecognizer and streams partial results back into the model. Playback is handled by a lightweight AudioPlayer observable that loads the file URL stored on the memo. There are no remote calls; everything runs on-device.

VoiceMemoApp/
├── App/
│   └── VoiceMemoApp.swift          # @main, modelContainer setup
├── Models/
│   └── VoiceMemo.swift             # @Model — id, title, fileURL, transcript, duration, createdAt
├── Services/
│   ├── AudioRecorder.swift         # @Observable — AVAudioSession + AVAudioRecorder
│   ├── TranscriptionService.swift  # @Observable — SFSpeechRecognizer async stream
│   └── AudioPlayer.swift           # @Observable — AVAudioPlayer + progress timer
├── Views/
│   ├── MemoListView.swift          # @Query list, record button
│   ├── MemoRowView.swift           # single row with waveform badge
│   ├── RecordingOverlayView.swift  # sheet shown while recording
│   └── MemoDetailView.swift        # full transcript + playback controls
├── Paywall/
│   └── SubscriptionView.swift      # StoreKit 2 paywall
└── PrivacyInfo.xcprivacy

Step-by-step

1. Project setup and permissions

Create a new Xcode project using the iOS App template with SwiftUI and SwiftData. You need three Info.plist keys before any AVFoundation or Speech API will work — missing even one will crash on the first permission request.

<!-- Info.plist additions -->
<key>NSMicrophoneUsageDescription</key>
<string>Voice Memo records audio using your microphone.</string>

<key>NSSpeechRecognitionUsageDescription</key>
<string>Voice Memo transcribes your recordings on-device.</string>

<!-- Required for background audio finishing (optional but recommended) -->
<key>UIBackgroundModes</key>
<array>
  <string>audio</string>
</array>

// VoiceMemoApp.swift
import SwiftUI
import SwiftData

@main
struct VoiceMemoApp: App {
    var body: some Scene {
        WindowGroup {
            MemoListView()
        }
        .modelContainer(for: VoiceMemo.self)
    }
}

2. Data model with SwiftData

The VoiceMemo model stores everything needed to reconstruct the memo — the file URL as a string (SwiftData can't store URL directly), the transcript, and duration so the UI can display it without loading the audio file.

// Models/VoiceMemo.swift
import Foundation
import SwiftData

@Model
final class VoiceMemo {
    var id: UUID
    var title: String
    var fileURLString: String          // Store as String; URL isn't Codable in SwiftData
    var transcript: String
    var duration: TimeInterval
    var createdAt: Date
    var isTranscribing: Bool

    var fileURL: URL? {
        URL(string: fileURLString)
    }

    init(title: String, fileURLString: String, duration: TimeInterval = 0) {
        self.id = UUID()
        self.title = title
        self.fileURLString = fileURLString
        self.transcript = ""
        self.duration = duration
        self.createdAt = Date()
        self.isTranscribing = false
    }
}

extension VoiceMemo {
    /// Human-readable duration string, e.g. "1:24"
    var durationString: String {
        let minutes = Int(duration) / 60
        let seconds = Int(duration) % 60
        return String(format: "%d:%02d", minutes, seconds)
    }
}

3. Core UI — memo list view

The list view uses @Query to subscribe to all memos sorted by creation date. A persistent floating button triggers the recording sheet. Swipe-to-delete removes the model object and the audio file together.

// Views/MemoListView.swift
import SwiftUI
import SwiftData

struct MemoListView: View {
    @Query(sort: \VoiceMemo.createdAt, order: .reverse)
    private var memos: [VoiceMemo]

    @Environment(\.modelContext) private var context
    @State private var isRecording = false
    @State private var recorder = AudioRecorder()

    var body: some View {
        NavigationStack {
            List {
                ForEach(memos) { memo in
                    NavigationLink(value: memo) {
                        MemoRowView(memo: memo)
                    }
                }
                .onDelete(perform: deleteMemos)
            }
            .navigationTitle("Voice Memos")
            .navigationDestination(for: VoiceMemo.self) { memo in
                MemoDetailView(memo: memo)
            }
            .overlay(alignment: .bottom) {
                RecordButton(isRecording: $isRecording)
                    .padding(.bottom, 32)
            }
        }
        .sheet(isPresented: $isRecording) {
            RecordingOverlayView(recorder: recorder) { savedMemo in
                context.insert(savedMemo)
            }
        }
    }

    private func deleteMemos(at offsets: IndexSet) {
        for index in offsets {
            let memo = memos[index]
            if let url = memo.fileURL {
                try? FileManager.default.removeItem(at: url)
            }
            context.delete(memo)
        }
    }
}

struct RecordButton: View {
    @Binding var isRecording: Bool

    var body: some View {
        Button {
            isRecording = true
        } label: {
            Image(systemName: "mic.circle.fill")
                .font(.system(size: 64))
                .foregroundStyle(.red)
                .shadow(radius: 8)
        }
    }
}

#Preview {
    MemoListView()
        .modelContainer(for: VoiceMemo.self, inMemory: true)
}

4. Audio recording with AVFoundation

AudioRecorder is an @Observable class that activates the audio session, starts the recorder, and returns the saved file URL when stopped. Using .playAndRecord with .defaultToSpeaker ensures the session is compatible with simultaneous playback in other app states.

// Services/AudioRecorder.swift
import AVFoundation
import Observation

@Observable
final class AudioRecorder: NSObject {
    var isRecording = false
    var currentLevel: Float = 0          // –160…0 dB, used for waveform UI
    var elapsedTime: TimeInterval = 0

    private var recorder: AVAudioRecorder?
    private var timer: Timer?
    private var startDate: Date?

    func start() throws -> URL {
        let session = AVAudioSession.sharedInstance()
        try session.setCategory(.playAndRecord, options: [.defaultToSpeaker, .allowBluetooth])
        try session.setActive(true)

        let url = documentsURL().appendingPathComponent("\(UUID().uuidString).m4a")
        let settings: [String: Any] = [
            AVFormatIDKey: Int(kAudioFormatMPEG4AAC),
            AVSampleRateKey: 44_100,
            AVNumberOfChannelsKey: 1,
            AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
        ]
        recorder = try AVAudioRecorder(url: url, settings: settings)
        recorder?.isMeteringEnabled = true
        recorder?.record()

        startDate = Date()
        isRecording = true
        timer = Timer.scheduledTimer(withTimeInterval: 0.1, repeats: true) { [weak self] _ in
            self?.tick()
        }
        return url
    }

    func stop() -> TimeInterval {
        timer?.invalidate(); timer = nil
        recorder?.stop()
        isRecording = false
        let duration = Date().timeIntervalSince(startDate ?? Date())
        try? AVAudioSession.sharedInstance().setActive(false)
        return duration
    }

    private func tick() {
        recorder?.updateMeters()
        currentLevel = recorder?.averagePower(forChannel: 0) ?? -160
        elapsedTime = Date().timeIntervalSince(startDate ?? Date())
    }

    private func documentsURL() -> URL {
        FileManager.default.urls(for: .documentDirectory, in: .userDomainMask)[0]
    }
}

5. On-device speech transcription

TranscriptionService uses SFSpeechRecognizer with a file-based request so the audio never leaves the device. The requiresOnDeviceRecognition flag enforces this — Apple's on-device models support English, Spanish, French, German, and several other locales as of iOS 17.

// Services/TranscriptionService.swift
import Speech
import Observation

@Observable
final class TranscriptionService {
    var isAuthorized = false

    func requestAuthorization() async {
        let status = await SFSpeechRecognizer.requestAuthorization()  // runs on main actor
        isAuthorized = (status == .authorized)
    }

    /// Transcribes the file at `url` and returns the best transcript string.
    func transcribe(url: URL) async throws -> String {
        guard let recognizer = SFSpeechRecognizer(), recognizer.isAvailable else {
            throw TranscriptionError.recognizerUnavailable
        }

        let request = SFSpeechURLRecognitionRequest(url: url)
        request.requiresOnDeviceRecognition = true   // no network call
        request.shouldReportPartialResults = false

        return try await withCheckedThrowingContinuation { continuation in
            recognizer.recognitionTask(with: request) { result, error in
                if let error {
                    continuation.resume(throwing: error)
                    return
                }
                guard let result, result.isFinal else { return }
                continuation.resume(returning: result.bestTranscription.formattedString)
            }
        }
    }

    enum TranscriptionError: LocalizedError {
        case recognizerUnavailable
        var errorDescription: String? { "Speech recognizer is not available on this device." }
    }
}

6. Playback and recording overlay

The recording overlay sheet ties together the recorder and transcription service — it starts recording on appear, shows a live level meter, and on stop kicks off async transcription before inserting the new memo into the model context.

// Views/RecordingOverlayView.swift
import SwiftUI

struct RecordingOverlayView: View {
    let recorder: AudioRecorder
    let onSave: (VoiceMemo) -> Void
    @Environment(\.dismiss) private var dismiss

    @State private var transcription = TranscriptionService()
    @State private var savedURL: URL?
    @State private var isTranscribing = false
    @State private var errorMessage: String?

    var body: some View {
        VStack(spacing: 24) {
            Spacer()
            Text(recorder.isRecording ? timeString(recorder.elapsedTime) : "Done")
                .font(.system(size: 48, weight: .thin, design: .monospaced))

            // Live level meter
            HStack(spacing: 3) {
                ForEach(0..<20, id: \.self) { i in
                    let threshold = Float(-160 + i * 8)
                    RoundedRectangle(cornerRadius: 2)
                        .fill(recorder.currentLevel > threshold ? Color.red : Color.gray.opacity(0.3))
                        .frame(width: 8, height: 24)
                }
            }

            if isTranscribing {
                ProgressView("Transcribing…")
            }

            if let error = errorMessage {
                Text(error).font(.caption).foregroundStyle(.red)
            }

            Button(role: .destructive) {
                Task { await stopAndSave() }
            } label: {
                Label("Stop & Save", systemImage: "stop.circle.fill")
                    .font(.title3.bold())
            }
            .disabled(isTranscribing)
            Spacer()
        }
        .padding()
        .task { await startRecording() }
    }

    private func startRecording() async {
        await transcription.requestAuthorization()
        savedURL = try? recorder.start()
    }

    private func stopAndSave() async {
        let duration = recorder.stop()
        guard let url = savedURL else { dismiss(); return }

        let formatter = DateFormatter()
        formatter.dateStyle = .medium
        formatter.timeStyle = .short
        let memo = VoiceMemo(
            title: "Memo \(formatter.string(from: Date()))",
            fileURLString: url.absoluteString,
            duration: duration
        )
        memo.isTranscribing = true
        onSave(memo)

        isTranscribing = true
        do {
            memo.transcript = try await transcription.transcribe(url: url)
        } catch {
            errorMessage = error.localizedDescription
        }
        memo.isTranscribing = false
        isTranscribing = false
        dismiss()
    }

    private func timeString(_ t: TimeInterval) -> String {
        String(format: "%02d:%02d", Int(t) / 60, Int(t) % 60)
    }
}

#Preview {
    RecordingOverlayView(recorder: AudioRecorder()) { _ in }
}

7. Privacy Manifest — required for App Store

Apps using AVFoundation's audio APIs and file timestamps must include a PrivacyInfo.xcprivacy file. Missing it will trigger an App Store review rejection. Add the file to the app target (not a framework target) via File → New → Privacy Manifest.

<!-- PrivacyInfo.xcprivacy -->
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>NSPrivacyCollectedDataTypes</key>
  <array/>

  <key>NSPrivacyAccessedAPITypes</key>
  <array>
    <dict>
      <key>NSPrivacyAccessedAPIType</key>
      <string>NSPrivacyAccessedAPICategoryFileTimestamp</string>
      <key>NSPrivacyAccessedAPITypeReasons</key>
      <array>
        <string>C617.1</string> <!-- File created by the app itself -->
      </array>
    </dict>
    <dict>
      <key>NSPrivacyAccessedAPIType</key>
      <string>NSPrivacyAccessedAPICategoryUserDefaults</string>
      <key>NSPrivacyAccessedAPITypeReasons</key>
      <array>
        <string>CA92.1</string> <!-- Read/write app's own defaults -->
      </array>
    </dict>
  </array>

  <key>NSPrivacyTracking</key>
  <false/>
</dict>
</plist>

Common pitfalls

AVAudioSession not deactivated after recording: If you forget to call setActive(false) after stopping, subsequent recordings in the same session will silently fail or produce empty files.
Storing URL directly in SwiftData: SwiftData's @Model macro cannot persist Foundation.URL — store the absoluteString and reconstruct with URL(string:). A force-unwrap here is a common crash source.
requiresOnDeviceRecognition on unsupported locales: On-device recognition is only available for certain locales. If the user's device locale isn't supported, the request will fail with an unhelpful error. Check SFSpeechRecognizer.supportedLocales() and fall back gracefully.
App Store rejection for missing microphone justification: The App Store review team manually checks that your NSMicrophoneUsageDescription string clearly explains the user benefit — "this app uses the microphone" is routinely rejected. Be explicit: "Voice Memo uses the microphone to record audio notes."
Transcript not appearing after background transcription: SwiftData model updates from a background Task must happen on the main actor. Wrap memo.transcript = assignments in await MainActor.run { … } if you refactor transcription into a detached task.

Adding monetization: Subscription

Voice Memo apps map well to a freemium subscription model: free users get a limited number of recordings (e.g., 10 total) or a cap on recording length, while subscribers unlock unlimited recordings, iCloud sync, and export to text. Implement this with StoreKit 2 — define a Product with type .autoRenewable in App Store Connect, then use Product.products(for: ["com.yourapp.pro_monthly"]) in a SubscriptionManager observable. Gate features by checking Transaction.currentEntitlements on app launch and in a .task modifier on gated views. The StoreKit Testing in Xcode (local StoreKit configuration file) lets you test subscription flows without a live App Store product.

Shipping this faster with Soarias

Soarias handles the tedious parts of the intermediate-complexity checklist: project scaffolding with the correct AVFoundation/Speech entitlements pre-wired, automatic generation of the PrivacyInfo.xcprivacy file with the right reason codes for file timestamp and UserDefaults access, fastlane lane configuration for TestFlight and App Store distribution, and uploading screenshots taken directly from your Simulator runs. That removes at minimum the project setup step, the entire Privacy Manifest step, and the manual ASC metadata entry from your week.

For an intermediate project like this Voice Memo app, most developers spend one to two days on Xcode project configuration, permission strings, fastlane setup, and App Store Connect metadata — work that has nothing to do with AVFoundation or SwiftUI. Soarias compresses that to under an hour, so your week-long build stays focused on the recording and transcription logic that actually makes the app yours.

Related guides

FAQ

Does this work on iOS 16?

The code as written requires iOS 17 because it uses the @Observable macro (introduced in iOS 17) and the #Preview macro. You can back-port by replacing @Observable with ObservableObject/@Published and #Preview with PreviewProvider, but SwiftData also requires iOS 17, so you'd also need to swap to Core Data for iOS 16 support.

Do I need a paid Apple Developer account to test?

No — you can sideload to your own device with a free Apple ID via Xcode. However, AVAudioSession and SFSpeechRecognizer both require real hardware (the Simulator won't request microphone access), so you do need a physical iPhone. A paid account ($99/year) is only required for TestFlight distribution and App Store submission.

How do I add this to the App Store?

Create an app record in App Store Connect, archive your build in Xcode (Product → Archive), upload via Organizer or xcrun altool, fill in the required metadata (screenshots, privacy labels, description), and submit for review. Plan for a 24–48 hour review window. The Privacy Manifest and microphone usage description are the two most common causes of rejection for audio apps — make sure both are in place before submitting.

How do I handle users whose device language doesn't support on-device transcription?

Check SFSpeechRecognizer(locale: Locale.current)?.supportsOnDeviceRecognition at runtime. If it returns false, you have two options: remove requiresOnDeviceRecognition = true to allow server-side recognition (which requires network access and Apple's servers), or show a localized message explaining that transcription isn't supported for the user's language and disable the feature gracefully. Never silently fail — users will assume it's a bug.

Last reviewed: 2026-05-11 by the Soarias team.