April 30, 2026

DepthAI YuNet: Fast Face Detection on OAK Cameras

Hands-on review of DepthAI YuNet, an open-source project for running the lightweight YuNet face detection model on Luxonis OAK cameras via DepthAI.

Running a neural network on the edge can be a pain — especially with face detection, where you need both speed and accuracy. DepthAI YuNet solves that by combining the lightweight YuNet model with Luxonis' DepthAI hardware (OAK‑1, OAK‑D). This open‑source Python project runs real‑time face detection directly on the camera's Myriad X VPU, offloading processing from your main computer. Because inference stays on‑device, video data never leaves the camera — a privacy win for sensitive applications.

What is DepthAI YuNet?

DepthAI YuNet is a wrapper that runs the YuNet face detection model on DepthAI cameras. YuNet is a compact, efficient model from OpenCV that scores 0.834 on WIDER Face Easy. The project provides pre‑compiled models (blobs) for several resolutions, along with scripts to generate custom ones. It supports two modes: Host mode (postprocessing on the host CPU) and Edge mode (postprocessing on the device), giving developers flexibility to trade off CPU usage vs. latency.

How It Works: Host vs. Edge Mode

The architecture is straightforward:

Two models: a YuNet ONNX model converted to OpenVINO IR then blob, and a separate postprocessing model for Edge mode that performs NMS and score filtering.
Input resolution: blobs are fixed‑size (e.g., 180×320, 270×480). You choose a resolution that balances speed and accuracy.

Resolution	FPS (OAK‑D, Host mode)	FPS (OAK‑D, Edge mode)
180×320	~30	~35
270×480	~20	~25
360×640	~15	~18

Host mode: DepthAI sends raw outputs via USB; the host CPU runs NMS. Simple to debug but uses CPU cycles.
Edge mode: Postprocessing runs on the same VPU, reducing USB data transfer. Requires the second model blob. Best for latency‑sensitive applications.
Model generation: a Docker container with PINTO's openvino2tensorflow tools converts ONNX to blob. The repo includes scripts to regenerate postprocessing models with custom thresholds.

Quick Start and Usage

First, install dependencies and clone the repo:

git clone https://github.com/geaxgx/depthai_yunet.git
cd depthai_yunet
python3 -m pip install -r requirements.txt

To run face detection using the internal camera in Host mode:

python3 demo.py

To use a video file instead:

python3 demo.py -i path/to/video.mp4

For Edge mode with lower latency:

python3 demo.py -e

Press keys in the OpenCV window to toggle bounding boxes, landmarks, scores, and FPS overlay.

Real‑world example

Detect faces in a recorded video and save the output:

python3 demo.py -i input_video.mp4 -o output_video.avi

For higher accuracy, switch to a larger model resolution:

python3 demo.py -i input_video.mp4 -mr 270x480 -o output.avi

For a real‑time application, use the internal camera and tweak --internal_fps and --model_resolution to balance FPS and detection quality. The default model (180×320) runs at ~30 FPS on OAK‑D.

Pros, Cons, and Alternatives

Pros

Fast inference on the VPU, freeing up host CPU.
Edge mode eliminates USB bandwidth bottlenecks.
Pre‑built models for common resolutions; easy to generate custom ones.
Clean Python API with OpenCV integration.
Solid documentation and CLI help.

Cons

Only works with DepthAI‑compatible hardware (OAK series).
Fixed input resolutions; no dynamic sizing.
Small community and maintenance (last commit Jan 2022).
Postprocessing model generation requires extra tooling (Docker, PyTorch).

Alternatives

OpenCV DNN with YuNet: Runs on CPU/GPU, no DepthAI needed. Slower on edge but works anywhere OpenCV runs.
MediaPipe Face Detection: Google's cross‑platform solution, optimized for mobile/edge. No DepthAI dependency, but not VPU‑accelerated.
Intel Distribution of OpenVINO: Run any model on Intel hardware (CPU, GPU, VPU). More flexible but steeper learning curve.

Verdict: Should You Use It?

If you own an OAK camera and need reliable, real‑time face detection without hogging your host CPU, DepthAI YuNet is a no‑brainer. It's a polished, minimal project that just works. Skip it if you don't have DepthAI hardware, need dynamic input sizes, or require active development — consider OpenCV or MediaPipe instead. For its target niche, it's a solid choice.