How to Build an OCR Feature in SwiftUI

iOS 17+ Xcode 16+ Advanced APIs: Vision / VNRecognizeTextRequest Updated: May 11, 2026

TL;DR

Create a VNRecognizeTextRequest, run it through a VNImageRequestHandler on a CGImage, then join the top candidates from the resulting VNRecognizedTextObservation array into a string. Wrap the Vision call in a withCheckedContinuation so it fits cleanly into Swift Concurrency.

import Vision

func recognizeText(in image: UIImage) async -> String {
    guard let cgImage = image.cgImage else { return "" }
    return await withCheckedContinuation { continuation in
        let request = VNRecognizeTextRequest { req, _ in
            let text = (req.results as? [VNRecognizedTextObservation])?
                .compactMap { $0.topCandidates(1).first?.string }
                .joined(separator: "\n") ?? ""
            continuation.resume(returning: text)
        }
        request.recognitionLevel = .accurate
        request.usesLanguageCorrection = true
        try? VNImageRequestHandler(cgImage: cgImage).perform([request])
    }
}

Full implementation

The approach pairs an @Observable view-model with PhotosPicker so users can select any photo from their library and immediately get machine-readable text back. The Vision request runs off the main actor inside a checked continuation, keeping the UI responsive during recognition. The result is rendered in a .textSelection(.enabled) view so users can copy, share, or further process the extracted text without extra tap targets.

import SwiftUI
import Vision
import PhotosUI

// MARK: - View-model

@Observable
final class OCRViewModel {
    var recognizedText: String = ""
    var isProcessing: Bool = false
    var selectedItem: PhotosPickerItem?
    var displayImage: UIImage?
    var errorMessage: String?

    func processSelectedItem(_ item: PhotosPickerItem?) async {
        guard let item else { return }
        isProcessing = true
        errorMessage = nil
        defer { isProcessing = false }

        do {
            guard let data = try await item.loadTransferable(type: Data.self),
                  let uiImage = UIImage(data: data) else {
                errorMessage = "Could not load image data."
                return
            }
            displayImage = uiImage
            recognizedText = try await recognizeText(in: uiImage)
        } catch {
            errorMessage = error.localizedDescription
        }
    }

    func copyToClipboard() {
        UIPasteboard.general.string = recognizedText
    }

    // MARK: - Vision

    private func recognizeText(in image: UIImage) async throws -> String {
        guard let cgImage = image.cgImage else {
            throw OCRError.invalidImage
        }

        return try await withCheckedThrowingContinuation { continuation in
            let request = VNRecognizeTextRequest { req, error in
                if let error {
                    continuation.resume(throwing: error)
                    return
                }
                let observations = req.results as? [VNRecognizedTextObservation] ?? []
                let text = observations
                    .compactMap { $0.topCandidates(1).first?.string }
                    .joined(separator: "\n")
                continuation.resume(returning: text.isEmpty ? "No text detected." : text)
            }
            request.recognitionLevel = .accurate
            request.usesLanguageCorrection = true
            request.automaticallyDetectsLanguage = true

            let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
            do {
                try handler.perform([request])
            } catch {
                continuation.resume(throwing: error)
            }
        }
    }
}

// MARK: - Error

enum OCRError: LocalizedError {
    case invalidImage
    var errorDescription: String? { "The selected image could not be converted for analysis." }
}

// MARK: - View

struct OCRView: View {
    @State private var viewModel = OCRViewModel()

    var body: some View {
        NavigationStack {
            ScrollView {
                VStack(alignment: .leading, spacing: 20) {

                    // Image picker
                    PhotosPicker(
                        selection: $viewModel.selectedItem,
                        matching: .images
                    ) {
                        Label("Select Image", systemImage: "photo.badge.plus")
                            .frame(maxWidth: .infinity)
                            .padding()
                            .background(.thinMaterial)
                            .clipShape(RoundedRectangle(cornerRadius: 12))
                    }
                    .accessibilityLabel("Open photo library to select an image for text recognition")

                    // Preview
                    if let image = viewModel.displayImage {
                        Image(uiImage: image)
                            .resizable()
                            .scaledToFit()
                            .clipShape(RoundedRectangle(cornerRadius: 12))
                            .frame(maxHeight: 300)
                            .accessibilityLabel("Selected image preview")
                    }

                    // State
                    if viewModel.isProcessing {
                        HStack {
                            ProgressView()
                            Text("Recognizing text…")
                                .foregroundStyle(.secondary)
                        }
                        .frame(maxWidth: .infinity, alignment: .center)

                    } else if let error = viewModel.errorMessage {
                        Text(error)
                            .foregroundStyle(.red)
                            .font(.footnote)

                    } else if !viewModel.recognizedText.isEmpty {
                        VStack(alignment: .leading, spacing: 8) {
                            HStack {
                                Text("Recognized Text")
                                    .font(.headline)
                                Spacer()
                                Button("Copy", systemImage: "doc.on.doc") {
                                    viewModel.copyToClipboard()
                                }
                                .font(.subheadline)
                                .accessibilityLabel("Copy recognized text to clipboard")
                            }
                            Text(viewModel.recognizedText)
                                .font(.body)
                                .textSelection(.enabled)
                                .frame(maxWidth: .infinity, alignment: .leading)
                                .padding()
                                .background(.thinMaterial)
                                .clipShape(RoundedRectangle(cornerRadius: 12))
                        }
                    }
                }
                .padding()
            }
            .navigationTitle("OCR Scanner")
            .navigationBarTitleDisplayMode(.large)
            .onChange(of: viewModel.selectedItem) { _, newItem in
                Task { await viewModel.processSelectedItem(newItem) }
            }
        }
    }
}

// MARK: - Preview

#Preview {
    OCRView()
}

How it works

@Observable view-model. The OCRViewModel class uses the iOS 17 @Observable macro instead of ObservableObject. Properties like isProcessing and recognizedText automatically trigger SwiftUI updates without @Published boilerplate.
PhotosPickerItem → CGImage pipeline. item.loadTransferable(type: Data.self) fetches the raw image bytes asynchronously, which are then converted to a UIImage and from there to the CGImage that Vision requires. This avoids loading the full PHAsset.
withCheckedThrowingContinuation bridge. Vision's completion-handler API predates Swift Concurrency. The continuation wrapper in recognizeText(in:) bridges it cleanly — errors from VNImageRequestHandler.perform are forwarded as throws rather than silently swallowed.
Recognition options. request.recognitionLevel = .accurate uses the neural-network path (slower but better quality). usesLanguageCorrection = true applies an on-device language model to fix OCR errors. automaticallyDetectsLanguage = true (new in iOS 16, stable by iOS 17) removes the need to hard-code a language.
.textSelection(.enabled). Applied to the results Text view so users can tap-and-hold to select a word, sentence, or the whole block — matching the native system behaviour they expect from a document scanner.

Variants

Live camera OCR with AVFoundation

For real-time recognition — think receipt scanning or document capture — feed sample buffers directly from AVCaptureSession into VNImageRequestHandler using the .init(cvPixelBuffer:orientation:options:) initialiser.

// Inside AVCaptureVideoDataOutputSampleBufferDelegate
func captureOutput(_ output: AVCaptureOutput,
                   didOutput sampleBuffer: CMSampleBuffer,
                   from connection: AVCaptureConnection) {
    guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }

    let request = VNRecognizeTextRequest { [weak self] req, _ in
        let lines = (req.results as? [VNRecognizedTextObservation])?
            .compactMap { $0.topCandidates(1).first?.string } ?? []
        DispatchQueue.main.async {
            self?.liveText = lines.joined(separator: " ")
        }
    }
    request.recognitionLevel = .fast  // .fast for live preview
    request.usesLanguageCorrection = false

    let handler = VNImageRequestHandler(
        cvPixelBuffer: pixelBuffer,
        orientation: .right,  // landscape-locked camera
        options: [:]
    )
    try? handler.perform([request])
}

Bounding-box highlights

Each VNRecognizedTextObservation carries a boundingBox in normalised Vision coordinates (origin at bottom-left, 0–1 range). Convert it to UIKit coordinates with VNImageRectForNormalizedRect(_:_:_:), then draw coloured overlays using a Canvas or Path in SwiftUI. This is useful for document scanners that need to highlight detected fields before the user confirms extraction.

Common pitfalls

🚫 Missing NSPhotoLibraryUsageDescription. PhotosPicker requires the NSPhotoLibraryUsageDescription key in Info.plist on iOS 17 when targeting devices below iOS 17 via Mac Catalyst or TestFlight. Omitting it crashes on launch in some configurations.
🚫 Calling handler.perform on the main thread. VNImageRequestHandler.perform is synchronous and CPU-bound. Always dispatch to a background executor — the withCheckedContinuation pattern above runs on whatever thread calls it, so ensure the call site is inside a Task (which dispatches off-main by default), never directly in a @MainActor context.
🚫 Using .fast recognition level for static images. .fast is designed for live video buffers. On a still photo it can miss up to 30% of text compared to .accurate. Always use .accurate when the user selects a single image from the library.
🚫 Forgetting VoiceOver labels on the results area. .textSelection(.enabled) changes the accessibility tree — pair it with an explicit .accessibilityLabel on the container so VoiceOver announces the region correctly before a user tries to navigate the content.

Prompt this with Claude Code

When using Soarias or Claude Code directly to implement this:

Implement an OCR feature in SwiftUI for iOS 17+.
Use Vision/VNRecognizeTextRequest with .accurate recognition level,
usesLanguageCorrection = true, and automaticallyDetectsLanguage = true.
Wrap the Vision call in withCheckedThrowingContinuation for async/await compatibility.
Use PhotosPicker (PhotosUI) to let users select an image.
Display results in a .textSelection(.enabled) Text view with a Copy button.
Make it accessible (VoiceOver labels on picker, image preview, and results area).
Add a #Preview with realistic sample data (a UIImage from SF Symbols or a bundled asset).

In the Soarias Build phase, paste this prompt into the active session after scaffolding your screen list — Claude Code will generate the full file, wire up the @Observable model, and surface any missing Info.plist keys before you ever run the simulator.

FAQ

Does this work on iOS 16?

Most of it does. VNRecognizeTextRequest is available from iOS 13, and PhotosPicker from iOS 16. However, @Observable requires iOS 17 — swap it for ObservableObject + @Published if you need iOS 16 support, and replace the #Preview macro with a PreviewProvider.

How accurate is on-device OCR compared to a cloud API?

For Latin-script printed text, Vision with .accurate approaches cloud-quality results with zero latency, zero cost, and full offline support. Handwriting recognition, dense multi-column layouts, and non-Latin scripts (e.g. Arabic, CJK) still benefit from a cloud model. For most document and receipt scanning use cases on iOS 17 devices, the on-device path is production-ready and Apple's Neural Engine processes a typical A4 page in under 200 ms.

What is the UIKit equivalent?

In UIKit you'd present a PHPickerViewController (or use a UIImagePickerController), receive the image in the delegate callback, then run the exact same Vision pipeline. The Vision layer is framework-agnostic — only the image input and result display differ between UIKit and SwiftUI. On iOS 16+ you can also use the higher-level VNDocumentCameraViewController (part of VisionKit) which combines capture, perspective correction, and OCR in a single system sheet.

Last reviewed: 2026-05-11 by the Soarias team.