Almost every conversation about 'AI for CCTV' assumes the AI runs in the cloud. The video streams up, gets processed, and an alert comes back. It's the default mental model, and for retail security, it's the wrong one.
Latency: the small numbers matter
When a concealment is happening, the latency budget between detection and intervention is short. By the time the manager reaches the floor, the subject is at the door. Adding a round-trip to the cloud, even on a good connection, adds latency that costs interceptions.
Edge inference removes the round-trip entirely. The detection happens at the camera, the alert goes out from the edge node, and the cloud only sees the event after the fact.
Bandwidth: the economics matter
An IP camera at 1080p / 30fps generates around 4–8 Mbps of video. A store with twenty cameras streaming continuously generates 80–160 Mbps. Multiply across an estate of 50+ stores and you have a bandwidth bill that quickly dwarfs any AI vendor's licence cost.
Edge inference flips this. The camera streams to the local edge node (LAN, free), and only events, timestamped, scored, with the relevant clip, go to the cloud. Bandwidth requirements per store drop to a fraction of a typical broadband connection.
GDPR: the architectural choice that matters
Streaming raw video to the cloud means raw face data leaves the building. Under UK GDPR, that's a meaningful data-transfer event with consequences for your data-protection impact assessment, your sub-processor list, and your retention policy.
Edge inference keeps the raw video on-site. Only structured events (detection metadata, blurred clips, embeddings) reach the cloud. Your data minimisation story is immediately defensible.
What we run, where
QuantumEye's edge stack runs ONNX-compiled models on a per-store edge node. The state machines (concealment, grab-and-run, restricted-zone breach) run there. Face detection runs there. Face embedding runs there. The cloud handles the things the cloud is good at: cross-store aggregation, vector search, audit storage, the dashboard.
- Per-camera ONNX inference for detection state machines
- On-device face detection + embedding (vector goes up; raw face crop does not)
- Multi-camera handoff (ByteTrack) running in the store
- Cloud handles: vector search (Qdrant via S3 Vectors), audit log, dashboard, alerts
When cloud inference makes sense
We're not categorical. Some retail AI workloads, long-running batch analytics over weeks of data, model retraining, cross-tenant benchmarking, belong in the cloud. The point is to choose the right tier for the right workload. Real-time security detection lives at the edge.