We propose a method of improving detection precision (mAP) with the help of the prior knowledge about the scene geometry: we assume the scene to be a plane with objects placed on it. We focus our attention on autonomous robots, so given the robot's dimensions and the inclination angles of the camera, it is possible to predict the spatial scale for each pixel of the input frame. With slightly modified YOLOv3-tiny we demonstrate that the detection supplemented by the scale channel, further referred as S, outperforms standard RGB-based detection with small computational overhead.
Dongsheng LiYujie ZhangJunping XiangJianfei LiYu Zou
Xiaodong YuTa Wen KuanYuhan ZhangTaijun Yan
Shuo ZhangYanxia WuChaoguang MenXiaosong Li
Manan SharmaR. RahulS MadhusudanS. P. DeepuDavid S. Sumam