

The DeepStream NvDsPreprocess plugin enables you to invest compute over a specific area of the frame. In this case, if the detection is applied over the entire frame, it is an unnecessary use of compute resources. You might find that the region of interest is a small area and not the entire frame. Typically, receptive fields vary from 256×256 to 1280×1280 and beyond in the detection and segmentation networks. This fixed-size input is also known as the receptive field of the model. It takes a huge amount of computing power to process such a huge number of pixels, especially with a deep neural network.īased on the input dimension selected during model building, deep neural networks operate on the fixed shape input. With the increase in resolution, the number of pixels increases exponentially. The military uses aerial photography for various purposes and that also has a large area covered. Nowadays, 4K and 8K cameras are used to capture details of the scene. Video surveillance systems are used to solve various problems such as the identification of pedestrians, vehicles, and cars. In this post, we discuss at how NVIDIA DeepStream can help in applying smaller models onto high-resolution input to detect a specific frame region. Smaller models are also considered edge-deployment-friendly. No retraining is required to apply the model to the high-resolution input. Smaller models are used so lesser computation power is required in training and inference. The second method, dividing the entire image into tiles and applying smaller models to each tile has obvious advantages. Larger models are deemed unfit for edge deployment on smaller devices. Training or deploying such a model also requires more computing resources. Training a model with large input often requires larger backbones, making the overall model bulkier. In many ways, the first approach is difficult. Divide the large image into tiles and apply the smaller model to each one.Use a large model with a high input resolution.When a certain area of the frame is of interest, inference over the complete frame is unnecessary.

Detecting objects in high-resolution input is a well-known problem in computer vision.
