How to Build On-Device ML in SwiftUI
Drag a .mlmodel into Xcode, wrap it in a
VNCoreMLModel, fire a
VNCoreMLRequest on a background actor, and stream
VNClassificationObservation results into an
@Observable class — the whole pipeline runs entirely on-device.
import CoreML, Vision
@Observable
final class Classifier {
var labels: [(String, Float)] = []
func classify(_ uiImage: UIImage) async throws {
let vnModel = try VNCoreMLModel(for: MobileNetV2().model)
let request = VNCoreMLRequest(model: vnModel)
guard let cgImage = uiImage.cgImage else { return }
try VNImageRequestHandler(cgImage: cgImage).perform([request])
labels = (request.results as? [VNClassificationObservation] ?? [])
.prefix(5).map { ($0.identifier, $0.confidence) }
}
}
Full implementation
The example below builds a self-contained image-classification screen. The user picks a photo from the library
via PhotosPicker; the
Classifier actor runs the CoreML pipeline on a
detached task so the UI stays interactive throughout. Results are rendered in a labeled progress bar list
so confidence scores are immediately legible.
import SwiftUI
import CoreML
import Vision
import PhotosUI
// MARK: – Model
@Observable
@MainActor
final class Classifier {
var labels: [(label: String, confidence: Float)] = []
var isRunning = false
var errorMessage: String?
func classify(_ uiImage: UIImage) async {
isRunning = true
errorMessage = nil
defer { isRunning = false }
do {
// Move heavy work off the main actor
let results: [(String, Float)] = try await Task.detached(priority: .userInitiated) {
let configuration = MLModelConfiguration()
configuration.computeUnits = .all // CPU + Neural Engine + GPU
let vnModel = try VNCoreMLModel(
for: MobileNetV2(configuration: configuration).model
)
let request = VNCoreMLRequest(model: vnModel)
request.imageCropAndScaleOption = .centerCrop
guard let cgImage = uiImage.cgImage else {
throw ClassifierError.invalidImage
}
try VNImageRequestHandler(cgImage: cgImage, options: [:])
.perform([request])
return (request.results as? [VNClassificationObservation] ?? [])
.prefix(5)
.map { ($0.identifier.capitalized, $0.confidence) }
}.value
labels = results.map { (label: $0.0, confidence: $0.1) }
} catch {
errorMessage = error.localizedDescription
}
}
}
enum ClassifierError: LocalizedError {
case invalidImage
var errorDescription: String? { "Could not convert image to CGImage." }
}
// MARK: – View
struct OnDeviceMLView: View {
@State private var classifier = Classifier()
@State private var selectedItem: PhotosPickerItem?
@State private var displayImage: Image?
var body: some View {
NavigationStack {
ScrollView {
VStack(spacing: 24) {
// Photo picker
PhotosPicker(
selection: $selectedItem,
matching: .images,
photoLibrary: .shared()
) {
ZStack {
RoundedRectangle(cornerRadius: 16)
.fill(Color(.secondarySystemBackground))
.frame(height: 260)
if let displayImage {
displayImage
.resizable()
.scaledToFill()
.frame(height: 260)
.clipShape(RoundedRectangle(cornerRadius: 16))
} else {
Label("Choose a photo", systemImage: "photo.badge.plus")
.foregroundStyle(.secondary)
}
}
}
.onChange(of: selectedItem) { _, newItem in
Task {
guard let newItem,
let data = try? await newItem.loadTransferable(type: Data.self),
let uiImage = UIImage(data: data) else { return }
displayImage = Image(uiImage: uiImage)
await classifier.classify(uiImage)
}
}
.accessibilityLabel("Photo picker")
// Results
if classifier.isRunning {
ProgressView("Classifying…")
.progressViewStyle(.circular)
} else if let error = classifier.errorMessage {
Text(error)
.foregroundStyle(.red)
.font(.caption)
} else if !classifier.labels.isEmpty {
ResultsGrid(labels: classifier.labels)
}
Spacer()
}
.padding()
}
.navigationTitle("On-Device ML")
}
}
}
struct ResultsGrid: View {
let labels: [(label: String, confidence: Float)]
var body: some View {
VStack(alignment: .leading, spacing: 12) {
Text("Top predictions")
.font(.headline)
ForEach(labels, id: \.label) { item in
VStack(alignment: .leading, spacing: 4) {
HStack {
Text(item.label)
.font(.subheadline)
Spacer()
Text(String(format: "%.1f%%", item.confidence * 100))
.font(.caption.monospacedDigit())
.foregroundStyle(.secondary)
}
ProgressView(value: Double(item.confidence))
.tint(confidenceColor(item.confidence))
.accessibilityLabel("\(item.label): \(Int(item.confidence * 100)) percent confidence")
}
}
}
.padding()
.background(Color(.secondarySystemBackground), in: RoundedRectangle(cornerRadius: 14))
}
private func confidenceColor(_ c: Float) -> Color {
c > 0.7 ? .green : c > 0.4 ? .orange : .red
}
}
#Preview {
OnDeviceMLView()
}
How it works
-
MLModelConfiguration with
.computeUnits = .all— CoreML automatically routes work across the CPU, GPU, and Apple Neural Engine based on the model's operator graph. Forcing.cpuOnlyis slower on modern chips; leave it as.allunless you're benchmarking. -
Task.detached(priority: .userInitiated)— Even thoughClassifieris@MainActor, the heavy prediction work runs on a detached task so frames never drop. The.valueawait bridges the result back to the main actor safely. -
request.imageCropAndScaleOption = .centerCrop— MobileNetV2 expects a 224×224 square input. Setting.centerCroptells Vision to crop from the center rather than letterbox, which matches how the model was trained and improves accuracy. -
@Observablewithdefer { isRunning = false }— Thedeferblock guarantees the spinner disappears even if an error is thrown mid-classification, keeping the UI consistent without try/catch duplication. -
ProgressView confidence bars —
Each
VNClassificationObservation.confidenceis already a 0–1 Float, so it maps directly toProgressView(value:)without any normalization math.
Variants
Live camera feed classification with AVFoundation
Instead of a picked image, pipe CMSampleBuffer frames
directly to Vision. Because VNImageRequestHandler accepts
a CVPixelBuffer, no JPEG round-trip is needed — latency
drops to single-digit milliseconds on A15+ chips.
// In your AVCaptureVideoDataOutputSampleBufferDelegate:
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let request = VNCoreMLRequest(model: vnModel) { [weak self] req, _ in
let top = (req.results as? [VNClassificationObservation])?
.first
DispatchQueue.main.async {
self?.topLabel = top?.identifier.capitalized ?? "—"
self?.topConfidence = top?.confidence ?? 0
}
}
request.imageCropAndScaleOption = .centerCrop
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
orientation: .right).perform([request])
}
Using a custom Create ML model
Train a model in Create ML (or export from PyTorch/TensorFlow via coremltools),
drag the generated .mlpackage into Xcode, and
replace MobileNetV2() with your class name — the rest of
the pipeline is identical. Xcode auto-generates the typed Swift wrapper on build, including input/output feature
descriptions so you get compile-time safety on feature names.
Common pitfalls
-
Forgetting the Privacy — Photo Library Usage Description.
PhotosPicker still requires
NSPhotoLibraryUsageDescriptioninInfo.pliston iOS 17; omitting it crashes at launch with a silent SIGABRT. -
Running Vision requests on the main thread.
VNImageRequestHandler.perform(_:)is synchronous and blocks its calling thread for tens to hundreds of milliseconds. Always call it inside aTask.detachedor a backgroundDispatchQueue. -
Re-creating the model on every call.
MobileNetV2()deserializes the model file each time — store theVNCoreMLModelas a lazy property on your classifier, not inside the classify function, to avoid 100–300 ms cold-start overhead on repeat calls. -
Ignoring
.imageCropAndScaleOption. The default is.scaleFit, which letterboxes non-square images. Most classification models were trained on center-cropped squares, so accuracy degrades noticeably on portrait photos unless you override to.centerCrop.
Prompt this with Claude Code
When using Soarias or Claude Code directly to implement this:
Implement on-device ML image classification in SwiftUI for iOS 17+. Use CoreML and Vision (VNCoreMLModel, VNCoreMLRequest, VNClassificationObservation). Run inference on a Task.detached so the main thread is never blocked. Expose results via an @Observable class. Make it accessible (VoiceOver labels on confidence bars). Add a #Preview with realistic sample data.
In Soarias, paste this into the Build phase after your screen layout is locked — Claude Code will wire the CoreML
pipeline to your existing view model and add the necessary Info.plist
privacy keys automatically.
Related
FAQ
Does this work on iOS 16?
Partially. VNCoreMLRequest and
MLModelConfiguration are available on iOS 14+.
However, the @Observable macro and
PhotosPicker require iOS 17+. If you need iOS 16
support, replace @Observable with
@ObservableObject / @Published and
PhotosPicker with a custom
UIViewControllerRepresentable wrapping
PHPickerViewController.
How do I quantize my model to reduce download size?
Use coremltools with
ct.compression_utils.palettize_weights or
ct.compression_utils.linear_quantize_weights to
reduce a 20 MB float32 model to ~5 MB at 8-bit precision with negligible accuracy loss. Xcode 16 also ships
an on-device compression tool under the "CoreML Model" inspector — select your
.mlpackage and click Optimize for Deployment.
What's the UIKit equivalent?
In UIKit you'd call VNImageRequestHandler.perform(_:)
inside a DispatchQueue.global(qos: .userInitiated).async
block, then dispatch results back to DispatchQueue.main
to update a UITableView. The Vision pipeline is
identical — only the concurrency and UI-binding primitives differ.
Last reviewed: 2026-05-11 by the Soarias team.