Edge AI vs cloud AI for retail security

Almost every conversation about 'AI for CCTV' assumes the AI runs in the cloud. The video streams up, gets processed, and an alert comes back. It's the default mental model, and for retail security, it's the wrong one.

Latency: the small numbers matter

When a concealment is happening, the latency budget between detection and intervention is short. By the time the manager reaches the floor, the subject is at the door. Adding a round-trip to the cloud, even on a good connection, adds latency that costs interceptions.

Edge inference removes the round-trip entirely. The detection happens at the camera, the alert goes out from the edge node, and the cloud only sees the event after the fact.

Bandwidth: the economics matter

An IP camera at 1080p / 30fps generates around 4–8 Mbps of video. A store with twenty cameras streaming continuously generates 80–160 Mbps. Multiply across an estate of 50+ stores and you have a bandwidth bill that quickly dwarfs any AI vendor's licence cost.

Edge inference flips this. The camera streams to the local edge node (LAN, free), and only events, timestamped, scored, with the relevant clip, go to the cloud. Bandwidth requirements per store drop to a fraction of a typical broadband connection.

GDPR: the architectural choice that matters

Streaming raw video to the cloud means raw face data leaves the building. Under UK GDPR, that's a meaningful data-transfer event with consequences for your data-protection impact assessment, your sub-processor list, and your retention policy.

Edge inference keeps the raw video on-site. Only structured events (detection metadata, blurred clips, embeddings) reach the cloud. Your data minimisation story is immediately defensible.

The DPIA conversation with your DPO is fundamentally easier when the continuous camera feed never leaves the store. The cloud sees events, not faces in the wild.

What we run, where

QuantumEye's edge stack runs ONNX-compiled models on a per-store edge node. The state machines (concealment, grab-and-run, restricted-zone breach) run there. Face detection runs there. Face embedding runs there. The cloud handles the things the cloud is good at: cross-store aggregation, vector search, audit storage, the dashboard.

Per-camera ONNX inference for detection state machines
On-device face detection + embedding (vector goes up; raw face crop does not)
Multi-camera handoff (ByteTrack) running in the store
Cloud handles: vector search (Qdrant via S3 Vectors), audit log, dashboard, alerts

When cloud inference makes sense

We're not categorical. Some retail AI workloads, long-running batch analytics over weeks of data, model retraining, cross-tenant benchmarking, belong in the cloud. The point is to choose the right tier for the right workload. Real-time security detection lives at the edge.

How the platform fits together

Edge, cloud, and what runs where

Edge AI vs cloud AI for retail security

Latency: the small numbers matter

Bandwidth: the economics matter

GDPR: the architectural choice that matters

What we run, where

When cloud inference makes sense

Related reading.

Vectors and video: why they shouldn't live in the same place

How we tuned our concealment thresholds for UK retail

Why 'human-in-the-loop' is non-negotiable for retail AI

Get the monthly brief.