How to Implement Face Detection in SwiftUI

iOS 17+ Xcode 16+ Advanced APIs: Vision / VNFaceObservation Updated: May 12, 2026

TL;DR

Create a VNDetectFaceRectanglesRequest, run it through VNImageRequestHandler, and convert each VNFaceObservation's normalized boundingBox to view coordinates — then overlay rectangles in SwiftUI with a GeometryReader.

import Vision
import SwiftUI

func detectFaces(in cgImage: CGImage) async throws -> [CGRect] {
    let request = VNDetectFaceRectanglesRequest()
    let handler = VNImageRequestHandler(cgImage: cgImage)
    try handler.perform([request])
    return request.results?.map(\.boundingBox) ?? []
}

// boundingBox is in Vision's normalized coords (origin bottom-left).
// Flip Y before mapping to view space:
// viewRect.origin.y = viewHeight - (box.origin.y + box.height) * viewHeight

Full implementation

The view below lets a user pick any photo from their library, runs VNDetectFaceRectanglesRequest on it off the main thread, and draws a teal bounding box for every face found. A GeometryReader tracks the image's rendered size so the coordinate conversion stays accurate at any screen scale. All heavy Vision work is dispatched with Task { } so the UI never blocks.

import SwiftUI
import Vision
import PhotosUI

// MARK: - Model

@Observable
final class FaceDetectionModel {
    var selectedItem: PhotosPickerItem?
    var displayImage: UIImage?
    var faceRects: [CGRect] = []
    var isDetecting = false
    var errorMessage: String?

    func loadAndDetect() async {
        guard let item = selectedItem else { return }
        isDetecting = true
        faceRects = []
        errorMessage = nil

        do {
            guard let data = try await item.loadTransferable(type: Data.self),
                  let uiImage = UIImage(data: data),
                  let cgImage = uiImage.cgImage else {
                errorMessage = "Could not load image."
                isDetecting = false
                return
            }
            displayImage = uiImage

            // Run Vision on a background thread
            let rects = try await Task.detached(priority: .userInitiated) {
                try Self.runDetection(on: cgImage)
            }.value

            faceRects = rects
        } catch {
            errorMessage = error.localizedDescription
        }
        isDetecting = false
    }

    private static func runDetection(on cgImage: CGImage) throws -> [CGRect] {
        let request = VNDetectFaceRectanglesRequest()
        request.revision = VNDetectFaceRectanglesRequestRevision3
        let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
        try handler.perform([request])
        return request.results?.map(\.boundingBox) ?? []
    }
}

// MARK: - Coordinate Conversion

extension CGRect {
    /// Vision uses normalized coords with origin at bottom-left.
    /// Convert to UIKit/SwiftUI coords (origin top-left) within `viewSize`.
    func toViewRect(in viewSize: CGSize) -> CGRect {
        CGRect(
            x: origin.x * viewSize.width,
            y: (1 - origin.y - height) * viewSize.height,
            width: width * viewSize.width,
            height: height * viewSize.height
        )
    }
}

// MARK: - Main View

struct FaceDetectionView: View {
    @State private var model = FaceDetectionModel()

    var body: some View {
        NavigationStack {
            VStack(spacing: 20) {
                PhotosPicker(
                    selection: $model.selectedItem,
                    matching: .images,
                    photoLibrary: .shared()
                ) {
                    Label("Choose Photo", systemImage: "photo.on.rectangle")
                        .font(.headline)
                        .padding()
                        .frame(maxWidth: .infinity)
                        .background(Color.teal.opacity(0.15))
                        .clipShape(RoundedRectangle(cornerRadius: 12))
                }
                .padding(.horizontal)
                .onChange(of: model.selectedItem) {
                    Task { await model.loadAndDetect() }
                }

                if let image = model.displayImage {
                    GeometryReader { geo in
                        let size = geo.size
                        ZStack(alignment: .topLeading) {
                            Image(uiImage: image)
                                .resizable()
                                .scaledToFit()
                                .frame(maxWidth: .infinity)

                            ForEach(Array(model.faceRects.enumerated()), id: \.offset) { _, rect in
                                let viewRect = rect.toViewRect(in: size)
                                Rectangle()
                                    .stroke(Color.teal, lineWidth: 2.5)
                                    .frame(width: viewRect.width, height: viewRect.height)
                                    .offset(x: viewRect.minX, y: viewRect.minY)
                                    .accessibilityLabel("Detected face region")
                            }
                        }
                    }
                    .aspectRatio(
                        image.size.width / image.size.height,
                        contentMode: .fit
                    )
                    .padding(.horizontal)
                } else {
                    ContentUnavailableView(
                        "No Image Selected",
                        systemImage: "face.dashed",
                        description: Text("Pick a photo to detect faces.")
                    )
                }

                if model.isDetecting {
                    ProgressView("Detecting faces…")
                }

                if let error = model.errorMessage {
                    Text(error)
                        .foregroundStyle(.red)
                        .font(.caption)
                }

                if !model.faceRects.isEmpty {
                    Text("\(model.faceRects.count) face\(model.faceRects.count == 1 ? "" : "s") found")
                        .font(.subheadline)
                        .foregroundStyle(.secondary)
                }

                Spacer()
            }
            .navigationTitle("Face Detection")
        }
    }
}

// MARK: - Preview

#Preview {
    FaceDetectionView()
}

How it works

VNDetectFaceRectanglesRequest + revision 3 — Pinning revision to VNDetectFaceRectanglesRequestRevision3 locks the model version so detection results stay consistent across OS updates. The request produces an array of VNFaceObservation objects, each carrying a boundingBox in normalized coordinates (0…1 on both axes, origin at bottom-left).
Task.detached for thread safety — VNImageRequestHandler.perform(_:) is synchronous and CPU-intensive. Wrapping it in Task.detached(priority: .userInitiated) moves it off the main actor and the cooperative thread pool, preventing UI jank while still propagating Swift concurrency cancellation correctly.
Y-axis flip in toViewRect(in:) — Vision's coordinate system has the origin at the bottom-left while SwiftUI places it at the top-left. The formula (1 - origin.y - height) * viewHeight correctly maps each face rectangle without clipping at the edges.
GeometryReader for live size tracking — The image renders at a size determined by .scaledToFit(), which changes with device orientation. Passing geo.size into toViewRect(in:) at draw time ensures boxes stay perfectly aligned even after rotation.
@Observable FaceDetectionModel — Using the iOS 17 @Observable macro instead of ObservableObject removes boilerplate and means SwiftUI only re-renders the parts of the view that actually read a changed property — the progress spinner, error text, and face count update independently of the image.

Variants

Facial landmarks (eyes, nose, mouth)

Swap to VNDetectFaceLandmarksRequest and read each observation's landmarks property for 68-point facial geometry. Useful for AR overlays or accessibility features.

func detectLandmarks(on cgImage: CGImage) throws -> [VNFaceObservation] {
    let request = VNDetectFaceLandmarksRequest()
    let handler = VNImageRequestHandler(cgImage: cgImage)
    try handler.perform([request])
    return request.results ?? []
}

// In your drawing layer:
ForEach(Array(observations.enumerated()), id: \.offset) { _, obs in
    if let allPoints = obs.landmarks?.allPoints {
        let points = allPoints.normalizedPoints.map { pt in
            // landmarks are relative to the face bounding box
            let faceRect = obs.boundingBox.toViewRect(in: viewSize)
            return CGPoint(
                x: faceRect.minX + pt.x * faceRect.width,
                y: faceRect.minY + (1 - pt.y) * faceRect.height
            )
        }
        Path { path in
            guard let first = points.first else { return }
            path.move(to: first)
            points.dropFirst().forEach { path.addLine(to: $0) }
        }
        .stroke(Color.yellow, lineWidth: 1)
    }
}

Live camera feed with AVFoundation

For real-time detection, configure an AVCaptureSession with an AVCaptureVideoDataOutput and run VNDetectFaceRectanglesRequest in the captureOutput(_:didOutput:from:) delegate callback. Set preferBackgroundProcessing = false on the request for lowest latency. Wrap the session in a UIViewControllerRepresentable or use the new DataScannerViewController as a simpler alternative for basic face presence detection.

Common pitfalls

⚠️ Forgetting the Y-axis flip. Vision's origin is at the bottom-left; SwiftUI's is at the top-left. Skipping the (1 - y - h) transform places every bounding box upside-down in the view, often completely off-screen.
⚠️ Running VNImageRequestHandler on the main thread. perform(_:) blocks the calling thread for the full inference duration (10–200 ms per image). Always dispatch it with Task.detached or a serial background queue to avoid dropped frames and ANR-style freezes.
⚠️ Using image display size instead of actual CGImage size for landmark coordinates. Landmark normalized points are relative to the face's bounding box, which is itself relative to the original CGImage — not the SwiftUI view. Compute view rects from the rendered GeometryReader size, not UIImage.size, or boxes will drift when the image is letterboxed.
⚠️ No NSCameraUsageDescription causes a silent crash. If you extend this to live camera, add NSCameraUsageDescription to your Info.plist or the app will terminate with no console message on first camera access.

Prompt this with Claude Code

When using Soarias or Claude Code directly to implement this:

Implement face detection in SwiftUI for iOS 17+.
Use Vision/VNFaceObservation and VNDetectFaceRectanglesRequest.
Draw bounding boxes over detected faces using GeometryReader + ZStack.
Handle Y-axis coordinate flip from Vision normalized space.
Make it accessible (VoiceOver labels for each detected face region).
Add a #Preview with realistic sample data.

Drop this prompt into Soarias during the Build phase after you've locked your screen designs — the Vision scaffolding is boilerplate-heavy, and Claude Code will generate the request handler, coordinate math, and async dispatch correctly in one pass, leaving you to focus on the product layer.

FAQ

Does this work on iOS 16?

VNDetectFaceRectanglesRequest itself is available back to iOS 11, but the @Observable macro and ContentUnavailableView used in the full implementation require iOS 17. Replace those two pieces with @StateObject / ObservableObject and a plain VStack placeholder and the detection logic runs fine on iOS 16. The code on this page targets iOS 17+ as-is.

Can I detect face attributes like age or emotion?

Apple's on-device Vision framework deliberately does not expose age, emotion, or identity attributes — only geometry (bounding boxes, landmarks, roll/yaw angles, and capture quality score via VNFaceObservation.faceCaptureQuality). For richer attributes you'd need to feed the cropped face region into a custom VNCoreMLRequest with a trained model, or use a server-side API. This keeps the Vision API privacy-preserving and App Store safe.

What's the UIKit equivalent?

In UIKit you'd still use the same Vision request, but overlay bounding boxes using a CAShapeLayer added to the UIImageView's layer. The coordinate conversion math is identical — Vision's normalized space is framework-level, not tied to SwiftUI. The main difference is that you compute the layer frame using AVMakeRect(aspectRatio:insideRect:) to account for contentMode = .scaleAspectFit letterboxing.

Last reviewed: 2026-05-12 by the Soarias team.