How to Build an OCR Feature in SwiftUI
Create a VNRecognizeTextRequest, run it through a VNImageRequestHandler on a CGImage, then join the top candidates from the resulting VNRecognizedTextObservation array into a string. Wrap the Vision call in a withCheckedContinuation so it fits cleanly into Swift Concurrency.
import Vision
func recognizeText(in image: UIImage) async -> String {
guard let cgImage = image.cgImage else { return "" }
return await withCheckedContinuation { continuation in
let request = VNRecognizeTextRequest { req, _ in
let text = (req.results as? [VNRecognizedTextObservation])?
.compactMap { $0.topCandidates(1).first?.string }
.joined(separator: "\n") ?? ""
continuation.resume(returning: text)
}
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true
try? VNImageRequestHandler(cgImage: cgImage).perform([request])
}
}
Full implementation
The approach pairs an @Observable view-model with PhotosPicker so users can select any photo from their library and immediately get machine-readable text back. The Vision request runs off the main actor inside a checked continuation, keeping the UI responsive during recognition. The result is rendered in a .textSelection(.enabled) view so users can copy, share, or further process the extracted text without extra tap targets.
import SwiftUI
import Vision
import PhotosUI
// MARK: - View-model
@Observable
final class OCRViewModel {
var recognizedText: String = ""
var isProcessing: Bool = false
var selectedItem: PhotosPickerItem?
var displayImage: UIImage?
var errorMessage: String?
func processSelectedItem(_ item: PhotosPickerItem?) async {
guard let item else { return }
isProcessing = true
errorMessage = nil
defer { isProcessing = false }
do {
guard let data = try await item.loadTransferable(type: Data.self),
let uiImage = UIImage(data: data) else {
errorMessage = "Could not load image data."
return
}
displayImage = uiImage
recognizedText = try await recognizeText(in: uiImage)
} catch {
errorMessage = error.localizedDescription
}
}
func copyToClipboard() {
UIPasteboard.general.string = recognizedText
}
// MARK: - Vision
private func recognizeText(in image: UIImage) async throws -> String {
guard let cgImage = image.cgImage else {
throw OCRError.invalidImage
}
return try await withCheckedThrowingContinuation { continuation in
let request = VNRecognizeTextRequest { req, error in
if let error {
continuation.resume(throwing: error)
return
}
let observations = req.results as? [VNRecognizedTextObservation] ?? []
let text = observations
.compactMap { $0.topCandidates(1).first?.string }
.joined(separator: "\n")
continuation.resume(returning: text.isEmpty ? "No text detected." : text)
}
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true
request.automaticallyDetectsLanguage = true
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try handler.perform([request])
} catch {
continuation.resume(throwing: error)
}
}
}
}
// MARK: - Error
enum OCRError: LocalizedError {
case invalidImage
var errorDescription: String? { "The selected image could not be converted for analysis." }
}
// MARK: - View
struct OCRView: View {
@State private var viewModel = OCRViewModel()
var body: some View {
NavigationStack {
ScrollView {
VStack(alignment: .leading, spacing: 20) {
// Image picker
PhotosPicker(
selection: $viewModel.selectedItem,
matching: .images
) {
Label("Select Image", systemImage: "photo.badge.plus")
.frame(maxWidth: .infinity)
.padding()
.background(.thinMaterial)
.clipShape(RoundedRectangle(cornerRadius: 12))
}
.accessibilityLabel("Open photo library to select an image for text recognition")
// Preview
if let image = viewModel.displayImage {
Image(uiImage: image)
.resizable()
.scaledToFit()
.clipShape(RoundedRectangle(cornerRadius: 12))
.frame(maxHeight: 300)
.accessibilityLabel("Selected image preview")
}
// State
if viewModel.isProcessing {
HStack {
ProgressView()
Text("Recognizing text…")
.foregroundStyle(.secondary)
}
.frame(maxWidth: .infinity, alignment: .center)
} else if let error = viewModel.errorMessage {
Text(error)
.foregroundStyle(.red)
.font(.footnote)
} else if !viewModel.recognizedText.isEmpty {
VStack(alignment: .leading, spacing: 8) {
HStack {
Text("Recognized Text")
.font(.headline)
Spacer()
Button("Copy", systemImage: "doc.on.doc") {
viewModel.copyToClipboard()
}
.font(.subheadline)
.accessibilityLabel("Copy recognized text to clipboard")
}
Text(viewModel.recognizedText)
.font(.body)
.textSelection(.enabled)
.frame(maxWidth: .infinity, alignment: .leading)
.padding()
.background(.thinMaterial)
.clipShape(RoundedRectangle(cornerRadius: 12))
}
}
}
.padding()
}
.navigationTitle("OCR Scanner")
.navigationBarTitleDisplayMode(.large)
.onChange(of: viewModel.selectedItem) { _, newItem in
Task { await viewModel.processSelectedItem(newItem) }
}
}
}
}
// MARK: - Preview
#Preview {
OCRView()
}
How it works
-
@Observableview-model. TheOCRViewModelclass uses the iOS 17@Observablemacro instead ofObservableObject. Properties likeisProcessingandrecognizedTextautomatically trigger SwiftUI updates without@Publishedboilerplate. -
PhotosPickerItem→CGImagepipeline.item.loadTransferable(type: Data.self)fetches the raw image bytes asynchronously, which are then converted to aUIImageand from there to theCGImagethat Vision requires. This avoids loading the fullPHAsset. -
withCheckedThrowingContinuationbridge. Vision's completion-handler API predates Swift Concurrency. The continuation wrapper inrecognizeText(in:)bridges it cleanly — errors fromVNImageRequestHandler.performare forwarded as throws rather than silently swallowed. -
Recognition options.
request.recognitionLevel = .accurateuses the neural-network path (slower but better quality).usesLanguageCorrection = trueapplies an on-device language model to fix OCR errors.automaticallyDetectsLanguage = true(new in iOS 16, stable by iOS 17) removes the need to hard-code a language. -
.textSelection(.enabled). Applied to the resultsTextview so users can tap-and-hold to select a word, sentence, or the whole block — matching the native system behaviour they expect from a document scanner.
Variants
Live camera OCR with AVFoundation
For real-time recognition — think receipt scanning or document capture — feed sample buffers directly from AVCaptureSession into VNImageRequestHandler using the .init(cvPixelBuffer:orientation:options:) initialiser.
// Inside AVCaptureVideoDataOutputSampleBufferDelegate
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let request = VNRecognizeTextRequest { [weak self] req, _ in
let lines = (req.results as? [VNRecognizedTextObservation])?
.compactMap { $0.topCandidates(1).first?.string } ?? []
DispatchQueue.main.async {
self?.liveText = lines.joined(separator: " ")
}
}
request.recognitionLevel = .fast // .fast for live preview
request.usesLanguageCorrection = false
let handler = VNImageRequestHandler(
cvPixelBuffer: pixelBuffer,
orientation: .right, // landscape-locked camera
options: [:]
)
try? handler.perform([request])
}
Bounding-box highlights
Each VNRecognizedTextObservation carries a boundingBox in normalised Vision coordinates (origin at bottom-left, 0–1 range). Convert it to UIKit coordinates with VNImageRectForNormalizedRect(_:_:_:), then draw coloured overlays using a Canvas or Path in SwiftUI. This is useful for document scanners that need to highlight detected fields before the user confirms extraction.
Common pitfalls
-
🚫 Missing NSPhotoLibraryUsageDescription. PhotosPicker requires the
NSPhotoLibraryUsageDescriptionkey inInfo.pliston iOS 17 when targeting devices below iOS 17 via Mac Catalyst or TestFlight. Omitting it crashes on launch in some configurations. -
🚫 Calling
handler.performon the main thread.VNImageRequestHandler.performis synchronous and CPU-bound. Always dispatch to a background executor — thewithCheckedContinuationpattern above runs on whatever thread calls it, so ensure the call site is inside aTask(which dispatches off-main by default), never directly in a@MainActorcontext. -
🚫 Using
.fastrecognition level for static images..fastis designed for live video buffers. On a still photo it can miss up to 30% of text compared to.accurate. Always use.accuratewhen the user selects a single image from the library. -
🚫 Forgetting VoiceOver labels on the results area.
.textSelection(.enabled)changes the accessibility tree — pair it with an explicit.accessibilityLabelon the container so VoiceOver announces the region correctly before a user tries to navigate the content.
Prompt this with Claude Code
When using Soarias or Claude Code directly to implement this:
Implement an OCR feature in SwiftUI for iOS 17+. Use Vision/VNRecognizeTextRequest with .accurate recognition level, usesLanguageCorrection = true, and automaticallyDetectsLanguage = true. Wrap the Vision call in withCheckedThrowingContinuation for async/await compatibility. Use PhotosPicker (PhotosUI) to let users select an image. Display results in a .textSelection(.enabled) Text view with a Copy button. Make it accessible (VoiceOver labels on picker, image preview, and results area). Add a #Preview with realistic sample data (a UIImage from SF Symbols or a bundled asset).
In the Soarias Build phase, paste this prompt into the active session after scaffolding your screen list — Claude Code will generate the full file, wire up the @Observable model, and surface any missing Info.plist keys before you ever run the simulator.
Related
FAQ
Does this work on iOS 16?
Most of it does. VNRecognizeTextRequest is available from iOS 13, and PhotosPicker from iOS 16. However, @Observable requires iOS 17 — swap it for ObservableObject + @Published if you need iOS 16 support, and replace the #Preview macro with a PreviewProvider.
How accurate is on-device OCR compared to a cloud API?
For Latin-script printed text, Vision with .accurate approaches cloud-quality results with zero latency, zero cost, and full offline support. Handwriting recognition, dense multi-column layouts, and non-Latin scripts (e.g. Arabic, CJK) still benefit from a cloud model. For most document and receipt scanning use cases on iOS 17 devices, the on-device path is production-ready and Apple's Neural Engine processes a typical A4 page in under 200 ms.
What is the UIKit equivalent?
In UIKit you'd present a PHPickerViewController (or use a UIImagePickerController), receive the image in the delegate callback, then run the exact same Vision pipeline. The Vision layer is framework-agnostic — only the image input and result display differ between UIKit and SwiftUI. On iOS 16+ you can also use the higher-level VNDocumentCameraViewController (part of VisionKit) which combines capture, perspective correction, and OCR in a single system sheet.
Last reviewed: 2026-05-11 by the Soarias team.