Exhausted motion estimation in HEVC with variable block sizes is the bottleneck for real time encoding. Previous works show that CTU-based parallel processing will induce much coding efficiency loss. In this paper, focusing on the parallel processing in one CTU, efficient parallel algorithms are presented to accelerate integer motion estimation (IME) and fractional motion estimation (FME) on GPU, including multistep composite method for IME and eliminating redundancies for FME. Experimental results show that the highest time reduction can reach up to about 52.38% compared with the non-parallel baseline implementation.
Augusto GomezJhon Henry Bolaños PereaMaría Trujillo
Falei LuoSiwei MaJuncheng MaHonggang QiLi SuWen Gao
Stefan RadickeJens-Uwe HahnChristos GrecosQ. Wang