A good audio codec for live applications such as telecommunication is\ncharacterized by three key properties: (1) compression, i.e.\\ the bitrate that\nis required to transmit the signal should be as low as possible; (2) latency,\ni.e.\\ encoding and decoding the signal needs to be fast enough to enable\ncommunication without or with only minimal noticeable delay; and (3)\nreconstruction quality of the signal. In this work, we propose an open-source,\nstreamable, and real-time neural audio codec that achieves strong performance\nalong all three axes: it can reconstruct highly natural sounding 48~kHz speech\nsignals while operating at only 12~kbps and running with less than 6~ms\n(GPU)/10~ms (CPU) latency. An efficient training paradigm is also demonstrated\nfor developing such neural audio codecs for real-world scenarios. Both\nobjective and subjective evaluations using the VCTK corpus are provided. To sum\nup, AudioDec is a well-developed plug-and-play benchmark for audio codec\napplications.\n
Sunghwan AhnBeom Jun WooMin Hyun HanChanyeong MoonNam Soo Kim
Yuhao ZhaoMaoshen JiaJiawei RuLizhong WangLiang Wen
Zhengpu ZhangJianyuan FengYongjian MaoYehang ZhuJunjie ShiXuzhou YeShilei LiuDerong LiuChuanzeng Huang
Liang XuJing WangJianqian ZhangXiang Xie
Lingfeng ZhangLijiang ChenYuan SuChunfeng CuiQi Zhao