Jiaang DuanShiyou QianHanwen HuDingyu YangJian CaoGuangtao Xue
The fusion of serverless computing and deep learning (DL) has led to serverless inference, offering a promising approach for developing and deploying scalable and cost-efficient deep learning inference services (DLISs). However, the challenge of cold start presents a significant obstacle for DLISs, where DL model size greatly impacts latency. Existing studies mitigate cold starts by extending keep-alive times, which unfortunately leads to decreased resource utilization efficiency. To address this issue, we introduce PipeCo, a system designed to alleviate DLIS cold start. The core concept of PipeCo is to achieve the miniaturization and pipelining of DLIS cold start. Firstly, PipeCo utilizes a vertical partitioning approach to divide each DLIS into multiple slices, prewarming slices in a sequential and overlapping manner to decrease the overall cold-start latency. Secondly, PipeCo employs an attention-based prediction mechanism to estimate periodic patterns in requests and idle containers for scheduling slices. Thirdly, PipeCo incorporates a similarity-based container matcher for the reuse of idle containers. We implemented a prototype of PipeCo on the OpenFaaS platform and conducted extensive experiments using three real-world DLIS repositories. The results demonstrate that PipeCo effectively decreases end-to-end (E2E) latency by up to 62.67% on CPU and 58.81% on GPU clusters and reduces the overall resource usage by 65.31% compared to five state-of-the-art baselines.
Jiaang DuanShiyou QianHanwen HuDingyu YangJian CaoGuangtao Xue
Jiaang DuanShiyou QianHanwen HuDingyu YangJian CaoGuangtao Xue
Vinay RajHimanshu ChhaparwalSatwik Saale
N. Saravana KumarS. Selvakumara Samy
Thu Yein HtetThanda ShweIsrael MendonçaMasayoshi Aritsugi