The first iteration of our concealment detection used the off-the-shelf thresholds from the model's original training data, which was mostly US retail. The detection rates were fine. The false-positive rates were unacceptable for UK convenience.
Why the defaults didn't work
Three differences between US retail footage and UK convenience-store footage drive most of the gap:
- Aisle width. UK convenience aisles are narrower; cameras see more arm motion that isn't concealment but looks like it on a per-frame basis.
- Bag policy. UK shoppers carry rucksacks and shoulder bags into convenience stores routinely; US grocery customers more often use trolleys.
- Camera placement. UK convenience cameras are often higher and at steeper angles than the US grocery footage the defaults were trained on.
The three thresholds we tuned
Hand-product IoU
The detection fires when the hand and a product bounding box overlap above a threshold for some number of frames. The default IoU was too low, it fired on transient gestures that looked like a pickup but were people putting a product back. We raised the threshold and required the overlap to persist longer.
Hand-bag IoU + concealment frame count
The concealment trigger fires when the hand carrying a product overlaps with a bag region for some number of frames. The default frame count was too short, it fired on people taking a product out of their bag to compare prices. Raising the frame count, plus a product-lost-timeout, fixed it.
Confidence decay
Once a person is in 'carrying' state, their confidence score decays per frame they're not interacting with anything. The default decay was too slow, people who put a product back kept a high carrying score for too long. We doubled the decay rate.
The result
False-positive rates dropped meaningfully across our pilot sites without measurable loss in real-concealment detection. We continue to tune per-site, per-camera, as we collect more data.