In this work, we propose an adaptive host-chip architecture for video acquisition constrained under a given bit rate to optimize object tracking. The chip is an imaging instrument with limited computational power consisting of a very high-resolution focal plane array (FPA) that transmits quadtree (QT)-segmented video frames to the host. The host has unlimited computational power for video analysis. The QT-segmentation compresses the frames. We find the optimal QT decomposition using Viterbi Algorithm to solve a rate-distortion optimization problem. The distortion is computed between the current high quality frame and the previous reconstructed frame weighted in the regions of interests (RoIs) normalized over RoIs area. The weights are user-defined based on the class of objects to track. A Deep Neural Network based object detector, Faster R-CNN is used to detect objects of interest in each frame. A Kalman Filter tracks the location of the RoIs and also predicts the location of those objects based on a linear motion model. The predicted location of the RoIs guide the Viterbi Optimization to allocate priority (and hence more bits) to those regions. We evaluate our architecture’s performance based on Multiple Object Tracking Accuracy (MOTA) score. 

Leave a Reply