How to Build On-Device ML in SwiftUI

iOS 17+ Xcode 16+ Advanced APIs: CoreML, Vision Updated: May 11, 2026

TL;DR

Drag a .mlmodel into Xcode, wrap it in a VNCoreMLModel, fire a VNCoreMLRequest on a background actor, and stream VNClassificationObservation results into an @Observable class — the whole pipeline runs entirely on-device.

import CoreML, Vision

@Observable
final class Classifier {
    var labels: [(String, Float)] = []

    func classify(_ uiImage: UIImage) async throws {
        let vnModel = try VNCoreMLModel(for: MobileNetV2().model)
        let request = VNCoreMLRequest(model: vnModel)
        guard let cgImage = uiImage.cgImage else { return }
        try VNImageRequestHandler(cgImage: cgImage).perform([request])
        labels = (request.results as? [VNClassificationObservation] ?? [])
            .prefix(5).map { ($0.identifier, $0.confidence) }
    }
}

Full implementation

The example below builds a self-contained image-classification screen. The user picks a photo from the library via PhotosPicker; the Classifier actor runs the CoreML pipeline on a detached task so the UI stays interactive throughout. Results are rendered in a labeled progress bar list so confidence scores are immediately legible.

import SwiftUI
import CoreML
import Vision
import PhotosUI

// MARK: – Model

@Observable
@MainActor
final class Classifier {
    var labels: [(label: String, confidence: Float)] = []
    var isRunning = false
    var errorMessage: String?

    func classify(_ uiImage: UIImage) async {
        isRunning = true
        errorMessage = nil
        defer { isRunning = false }

        do {
            // Move heavy work off the main actor
            let results: [(String, Float)] = try await Task.detached(priority: .userInitiated) {
                let configuration = MLModelConfiguration()
                configuration.computeUnits = .all          // CPU + Neural Engine + GPU

                let vnModel = try VNCoreMLModel(
                    for: MobileNetV2(configuration: configuration).model
                )
                let request = VNCoreMLRequest(model: vnModel)
                request.imageCropAndScaleOption = .centerCrop

                guard let cgImage = uiImage.cgImage else {
                    throw ClassifierError.invalidImage
                }
                try VNImageRequestHandler(cgImage: cgImage, options: [:])
                    .perform([request])

                return (request.results as? [VNClassificationObservation] ?? [])
                    .prefix(5)
                    .map { ($0.identifier.capitalized, $0.confidence) }
            }.value

            labels = results.map { (label: $0.0, confidence: $0.1) }
        } catch {
            errorMessage = error.localizedDescription
        }
    }
}

enum ClassifierError: LocalizedError {
    case invalidImage
    var errorDescription: String? { "Could not convert image to CGImage." }
}

// MARK: – View

struct OnDeviceMLView: View {
    @State private var classifier = Classifier()
    @State private var selectedItem: PhotosPickerItem?
    @State private var displayImage: Image?

    var body: some View {
        NavigationStack {
            ScrollView {
                VStack(spacing: 24) {
                    // Photo picker
                    PhotosPicker(
                        selection: $selectedItem,
                        matching: .images,
                        photoLibrary: .shared()
                    ) {
                        ZStack {
                            RoundedRectangle(cornerRadius: 16)
                                .fill(Color(.secondarySystemBackground))
                                .frame(height: 260)
                            if let displayImage {
                                displayImage
                                    .resizable()
                                    .scaledToFill()
                                    .frame(height: 260)
                                    .clipShape(RoundedRectangle(cornerRadius: 16))
                            } else {
                                Label("Choose a photo", systemImage: "photo.badge.plus")
                                    .foregroundStyle(.secondary)
                            }
                        }
                    }
                    .onChange(of: selectedItem) { _, newItem in
                        Task {
                            guard let newItem,
                                  let data = try? await newItem.loadTransferable(type: Data.self),
                                  let uiImage = UIImage(data: data) else { return }
                            displayImage = Image(uiImage: uiImage)
                            await classifier.classify(uiImage)
                        }
                    }
                    .accessibilityLabel("Photo picker")

                    // Results
                    if classifier.isRunning {
                        ProgressView("Classifying…")
                            .progressViewStyle(.circular)
                    } else if let error = classifier.errorMessage {
                        Text(error)
                            .foregroundStyle(.red)
                            .font(.caption)
                    } else if !classifier.labels.isEmpty {
                        ResultsGrid(labels: classifier.labels)
                    }

                    Spacer()
                }
                .padding()
            }
            .navigationTitle("On-Device ML")
        }
    }
}

struct ResultsGrid: View {
    let labels: [(label: String, confidence: Float)]

    var body: some View {
        VStack(alignment: .leading, spacing: 12) {
            Text("Top predictions")
                .font(.headline)

            ForEach(labels, id: \.label) { item in
                VStack(alignment: .leading, spacing: 4) {
                    HStack {
                        Text(item.label)
                            .font(.subheadline)
                        Spacer()
                        Text(String(format: "%.1f%%", item.confidence * 100))
                            .font(.caption.monospacedDigit())
                            .foregroundStyle(.secondary)
                    }
                    ProgressView(value: Double(item.confidence))
                        .tint(confidenceColor(item.confidence))
                        .accessibilityLabel("\(item.label): \(Int(item.confidence * 100)) percent confidence")
                }
            }
        }
        .padding()
        .background(Color(.secondarySystemBackground), in: RoundedRectangle(cornerRadius: 14))
    }

    private func confidenceColor(_ c: Float) -> Color {
        c > 0.7 ? .green : c > 0.4 ? .orange : .red
    }
}

#Preview {
    OnDeviceMLView()
}

How it works

MLModelConfiguration with .computeUnits = .all — CoreML automatically routes work across the CPU, GPU, and Apple Neural Engine based on the model's operator graph. Forcing .cpuOnly is slower on modern chips; leave it as .all unless you're benchmarking.
Task.detached(priority: .userInitiated) — Even though Classifier is @MainActor, the heavy prediction work runs on a detached task so frames never drop. The .value await bridges the result back to the main actor safely.
request.imageCropAndScaleOption = .centerCrop — MobileNetV2 expects a 224×224 square input. Setting .centerCrop tells Vision to crop from the center rather than letterbox, which matches how the model was trained and improves accuracy.
@Observable with defer { isRunning = false } — The defer block guarantees the spinner disappears even if an error is thrown mid-classification, keeping the UI consistent without try/catch duplication.
ProgressView confidence bars — Each VNClassificationObservation.confidence is already a 0–1 Float, so it maps directly to ProgressView(value:) without any normalization math.

Variants

Live camera feed classification with AVFoundation

Instead of a picked image, pipe CMSampleBuffer frames directly to Vision. Because VNImageRequestHandler accepts a CVPixelBuffer, no JPEG round-trip is needed — latency drops to single-digit milliseconds on A15+ chips.

// In your AVCaptureVideoDataOutputSampleBufferDelegate:
func captureOutput(_ output: AVCaptureOutput,
                   didOutput sampleBuffer: CMSampleBuffer,
                   from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

    let request = VNCoreMLRequest(model: vnModel) { [weak self] req, _ in
        let top = (req.results as? [VNClassificationObservation])?
            .first
        DispatchQueue.main.async {
            self?.topLabel = top?.identifier.capitalized ?? "—"
            self?.topConfidence = top?.confidence ?? 0
        }
    }
    request.imageCropAndScaleOption = .centerCrop

    try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
                               orientation: .right).perform([request])
}

Using a custom Create ML model

Train a model in Create ML (or export from PyTorch/TensorFlow via coremltools), drag the generated .mlpackage into Xcode, and replace MobileNetV2() with your class name — the rest of the pipeline is identical. Xcode auto-generates the typed Swift wrapper on build, including input/output feature descriptions so you get compile-time safety on feature names.

Common pitfalls

Forgetting the Privacy — Photo Library Usage Description. PhotosPicker still requires NSPhotoLibraryUsageDescription in Info.plist on iOS 17; omitting it crashes at launch with a silent SIGABRT.
Running Vision requests on the main thread. VNImageRequestHandler.perform(_:) is synchronous and blocks its calling thread for tens to hundreds of milliseconds. Always call it inside a Task.detached or a background DispatchQueue.
Re-creating the model on every call. MobileNetV2() deserializes the model file each time — store the VNCoreMLModel as a lazy property on your classifier, not inside the classify function, to avoid 100–300 ms cold-start overhead on repeat calls.
Ignoring .imageCropAndScaleOption. The default is .scaleFit, which letterboxes non-square images. Most classification models were trained on center-cropped squares, so accuracy degrades noticeably on portrait photos unless you override to .centerCrop.

Prompt this with Claude Code

When using Soarias or Claude Code directly to implement this:

Implement on-device ML image classification in SwiftUI for iOS 17+.
Use CoreML and Vision (VNCoreMLModel, VNCoreMLRequest, VNClassificationObservation).
Run inference on a Task.detached so the main thread is never blocked.
Expose results via an @Observable class.
Make it accessible (VoiceOver labels on confidence bars).
Add a #Preview with realistic sample data.

In Soarias, paste this into the Build phase after your screen layout is locked — Claude Code will wire the CoreML pipeline to your existing view model and add the necessary Info.plist privacy keys automatically.

FAQ

Does this work on iOS 16?

Partially. VNCoreMLRequest and MLModelConfiguration are available on iOS 14+. However, the @Observable macro and PhotosPicker require iOS 17+. If you need iOS 16 support, replace @Observable with @ObservableObject / @Published and PhotosPicker with a custom UIViewControllerRepresentable wrapping PHPickerViewController.

How do I quantize my model to reduce download size?

Use coremltools with ct.compression_utils.palettize_weights or ct.compression_utils.linear_quantize_weights to reduce a 20 MB float32 model to ~5 MB at 8-bit precision with negligible accuracy loss. Xcode 16 also ships an on-device compression tool under the "CoreML Model" inspector — select your .mlpackage and click Optimize for Deployment.

What's the UIKit equivalent?

In UIKit you'd call VNImageRequestHandler.perform(_:) inside a DispatchQueue.global(qos: .userInitiated).async block, then dispatch results back to DispatchQueue.main to update a UITableView. The Vision pipeline is identical — only the concurrency and UI-binding primitives differ.

Last reviewed: 2026-05-11 by the Soarias team.