Nuxt HN | Ask HN: Dynamic ROI vs. Tiling for high-speed object tracking (<20ms latency)?

We are building a UAV system to physically intercept fast-moving targets (100km/h+) using onboard compute only (Qualcomm QRB5165).

We hit a wall regarding the Latency vs. Resolution trade-off and I’d love to hear some battle-tested opinions from the CV/Embedded community.

The constraint: We need HD resolution to detect small targets at range, but running inference on full HD frames kills our control loop frequency (Target is <20ms glass-to-motor response).

We are debating two architectural paths:

Option A: Static Tiling (SAHI-style) Slice the HD frame into overlapping tiles.

Pro: High detection probability for small objects.

Con: Even with NMS-free architectures, the inference time on the DSP effectively triples. Latency spikes cause our Proportional Navigation guidance to oscillate.

Option B: Dynamic ROI ("The Sniper Approach") Run a low-res global search (320x320) at high FPS. Once a target is found, lock a dynamic High-Res Region of Interest (ROI) from the raw camera stream and only run inference on that crop.

Pro: Extremely fast. Keeps the loop tight.

Con: Single Point of Failure. If the tracker (Kalman Filter) loses the crop due to abrupt ego-motion, we are blind until global search re-acquires. In a terminal phase intercept, that’s a miss.

Has anyone here successfully implemented robust Dynamic ROI on edge silicon (Jetson/Hexagon DSP) for erratic targets? Are we over-engineering this, or is full-frame HD inference simply dead on arrival for real-time guidance?

Any pointers to papers or repos are appreciated.

PS: If you live for these kinds of problems (and enjoy solving them in Munich), we are looking for a Founding Engineer to own this entire pipeline. Email in profile.