How to Implement Face Detection in SwiftUI
Create a VNDetectFaceRectanglesRequest, run it through
VNImageRequestHandler, and convert each
VNFaceObservation's normalized
boundingBox to view coordinates — then overlay
rectangles in SwiftUI with a GeometryReader.
import Vision
import SwiftUI
func detectFaces(in cgImage: CGImage) async throws -> [CGRect] {
let request = VNDetectFaceRectanglesRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
return request.results?.map(\.boundingBox) ?? []
}
// boundingBox is in Vision's normalized coords (origin bottom-left).
// Flip Y before mapping to view space:
// viewRect.origin.y = viewHeight - (box.origin.y + box.height) * viewHeight
Full implementation
The view below lets a user pick any photo from their library, runs
VNDetectFaceRectanglesRequest on it off the main thread,
and draws a teal bounding box for every face found. A
GeometryReader tracks the image's rendered size so the
coordinate conversion stays accurate at any screen scale. All heavy Vision work is dispatched with
Task { } so the UI never blocks.
import SwiftUI
import Vision
import PhotosUI
// MARK: - Model
@Observable
final class FaceDetectionModel {
var selectedItem: PhotosPickerItem?
var displayImage: UIImage?
var faceRects: [CGRect] = []
var isDetecting = false
var errorMessage: String?
func loadAndDetect() async {
guard let item = selectedItem else { return }
isDetecting = true
faceRects = []
errorMessage = nil
do {
guard let data = try await item.loadTransferable(type: Data.self),
let uiImage = UIImage(data: data),
let cgImage = uiImage.cgImage else {
errorMessage = "Could not load image."
isDetecting = false
return
}
displayImage = uiImage
// Run Vision on a background thread
let rects = try await Task.detached(priority: .userInitiated) {
try Self.runDetection(on: cgImage)
}.value
faceRects = rects
} catch {
errorMessage = error.localizedDescription
}
isDetecting = false
}
private static func runDetection(on cgImage: CGImage) throws -> [CGRect] {
let request = VNDetectFaceRectanglesRequest()
request.revision = VNDetectFaceRectanglesRequestRevision3
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
try handler.perform([request])
return request.results?.map(\.boundingBox) ?? []
}
}
// MARK: - Coordinate Conversion
extension CGRect {
/// Vision uses normalized coords with origin at bottom-left.
/// Convert to UIKit/SwiftUI coords (origin top-left) within `viewSize`.
func toViewRect(in viewSize: CGSize) -> CGRect {
CGRect(
x: origin.x * viewSize.width,
y: (1 - origin.y - height) * viewSize.height,
width: width * viewSize.width,
height: height * viewSize.height
)
}
}
// MARK: - Main View
struct FaceDetectionView: View {
@State private var model = FaceDetectionModel()
var body: some View {
NavigationStack {
VStack(spacing: 20) {
PhotosPicker(
selection: $model.selectedItem,
matching: .images,
photoLibrary: .shared()
) {
Label("Choose Photo", systemImage: "photo.on.rectangle")
.font(.headline)
.padding()
.frame(maxWidth: .infinity)
.background(Color.teal.opacity(0.15))
.clipShape(RoundedRectangle(cornerRadius: 12))
}
.padding(.horizontal)
.onChange(of: model.selectedItem) {
Task { await model.loadAndDetect() }
}
if let image = model.displayImage {
GeometryReader { geo in
let size = geo.size
ZStack(alignment: .topLeading) {
Image(uiImage: image)
.resizable()
.scaledToFit()
.frame(maxWidth: .infinity)
ForEach(Array(model.faceRects.enumerated()), id: \.offset) { _, rect in
let viewRect = rect.toViewRect(in: size)
Rectangle()
.stroke(Color.teal, lineWidth: 2.5)
.frame(width: viewRect.width, height: viewRect.height)
.offset(x: viewRect.minX, y: viewRect.minY)
.accessibilityLabel("Detected face region")
}
}
}
.aspectRatio(
image.size.width / image.size.height,
contentMode: .fit
)
.padding(.horizontal)
} else {
ContentUnavailableView(
"No Image Selected",
systemImage: "face.dashed",
description: Text("Pick a photo to detect faces.")
)
}
if model.isDetecting {
ProgressView("Detecting faces…")
}
if let error = model.errorMessage {
Text(error)
.foregroundStyle(.red)
.font(.caption)
}
if !model.faceRects.isEmpty {
Text("\(model.faceRects.count) face\(model.faceRects.count == 1 ? "" : "s") found")
.font(.subheadline)
.foregroundStyle(.secondary)
}
Spacer()
}
.navigationTitle("Face Detection")
}
}
}
// MARK: - Preview
#Preview {
FaceDetectionView()
}
How it works
-
VNDetectFaceRectanglesRequest + revision 3 — Pinning
revisiontoVNDetectFaceRectanglesRequestRevision3locks the model version so detection results stay consistent across OS updates. The request produces an array ofVNFaceObservationobjects, each carrying aboundingBoxin normalized coordinates (0…1 on both axes, origin at bottom-left). -
Task.detached for thread safety —
VNImageRequestHandler.perform(_:)is synchronous and CPU-intensive. Wrapping it inTask.detached(priority: .userInitiated)moves it off the main actor and the cooperative thread pool, preventing UI jank while still propagating Swift concurrency cancellation correctly. -
Y-axis flip in
toViewRect(in:)— Vision's coordinate system has the origin at the bottom-left while SwiftUI places it at the top-left. The formula(1 - origin.y - height) * viewHeightcorrectly maps each face rectangle without clipping at the edges. -
GeometryReader for live size tracking — The image renders at a size determined
by
.scaledToFit(), which changes with device orientation. Passinggeo.sizeintotoViewRect(in:)at draw time ensures boxes stay perfectly aligned even after rotation. -
@Observable FaceDetectionModel — Using the iOS 17
@Observablemacro instead ofObservableObjectremoves boilerplate and means SwiftUI only re-renders the parts of the view that actually read a changed property — the progress spinner, error text, and face count update independently of the image.
Variants
Facial landmarks (eyes, nose, mouth)
Swap to VNDetectFaceLandmarksRequest and read
each observation's landmarks property for
68-point facial geometry. Useful for AR overlays or accessibility features.
func detectLandmarks(on cgImage: CGImage) throws -> [VNFaceObservation] {
let request = VNDetectFaceLandmarksRequest()
let handler = VNImageRequestHandler(cgImage: cgImage)
try handler.perform([request])
return request.results ?? []
}
// In your drawing layer:
ForEach(Array(observations.enumerated()), id: \.offset) { _, obs in
if let allPoints = obs.landmarks?.allPoints {
let points = allPoints.normalizedPoints.map { pt in
// landmarks are relative to the face bounding box
let faceRect = obs.boundingBox.toViewRect(in: viewSize)
return CGPoint(
x: faceRect.minX + pt.x * faceRect.width,
y: faceRect.minY + (1 - pt.y) * faceRect.height
)
}
Path { path in
guard let first = points.first else { return }
path.move(to: first)
points.dropFirst().forEach { path.addLine(to: $0) }
}
.stroke(Color.yellow, lineWidth: 1)
}
}
Live camera feed with AVFoundation
For real-time detection, configure an
AVCaptureSession with an
AVCaptureVideoDataOutput and run
VNDetectFaceRectanglesRequest in the
captureOutput(_:didOutput:from:) delegate
callback. Set preferBackgroundProcessing = false
on the request for lowest latency. Wrap the session in a
UIViewControllerRepresentable or use
the new DataScannerViewController as a simpler
alternative for basic face presence detection.
Common pitfalls
-
⚠️ Forgetting the Y-axis flip. Vision's origin is at the bottom-left;
SwiftUI's is at the top-left. Skipping the
(1 - y - h)transform places every bounding box upside-down in the view, often completely off-screen. -
⚠️ Running VNImageRequestHandler on the main thread.
perform(_:)blocks the calling thread for the full inference duration (10–200 ms per image). Always dispatch it withTask.detachedor a serial background queue to avoid dropped frames and ANR-style freezes. -
⚠️ Using image display size instead of actual CGImage size for landmark coordinates.
Landmark normalized points are relative to the face's bounding box, which is itself relative
to the original CGImage — not the SwiftUI view. Compute view rects from the rendered
GeometryReadersize, notUIImage.size, or boxes will drift when the image is letterboxed. -
⚠️ No NSCameraUsageDescription causes a silent crash. If you extend this
to live camera, add
NSCameraUsageDescriptionto yourInfo.plistor the app will terminate with no console message on first camera access.
Prompt this with Claude Code
When using Soarias or Claude Code directly to implement this:
Implement face detection in SwiftUI for iOS 17+. Use Vision/VNFaceObservation and VNDetectFaceRectanglesRequest. Draw bounding boxes over detected faces using GeometryReader + ZStack. Handle Y-axis coordinate flip from Vision normalized space. Make it accessible (VoiceOver labels for each detected face region). Add a #Preview with realistic sample data.
Drop this prompt into Soarias during the Build phase after you've locked your screen designs — the Vision scaffolding is boilerplate-heavy, and Claude Code will generate the request handler, coordinate math, and async dispatch correctly in one pass, leaving you to focus on the product layer.
Related
FAQ
Does this work on iOS 16?
VNDetectFaceRectanglesRequest itself is
available back to iOS 11, but the @Observable
macro and ContentUnavailableView used in the
full implementation require iOS 17. Replace those two pieces with
@StateObject /
ObservableObject and a plain
VStack placeholder and the detection logic
runs fine on iOS 16. The code on this page targets iOS 17+ as-is.
Can I detect face attributes like age or emotion?
Apple's on-device Vision framework deliberately does not expose age, emotion, or identity
attributes — only geometry (bounding boxes, landmarks, roll/yaw angles, and capture quality
score via VNFaceObservation.faceCaptureQuality).
For richer attributes you'd need to feed the cropped face region into a custom
VNCoreMLRequest with a trained model, or
use a server-side API. This keeps the Vision API privacy-preserving and App Store safe.
What's the UIKit equivalent?
In UIKit you'd still use the same Vision request, but overlay bounding boxes using a
CAShapeLayer added to the
UIImageView's layer. The coordinate conversion
math is identical — Vision's normalized space is framework-level, not tied to SwiftUI. The
main difference is that you compute the layer frame using
AVMakeRect(aspectRatio:insideRect:) to account
for contentMode = .scaleAspectFit letterboxing.
Last reviewed: 2026-05-12 by the Soarias team.