PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms

Jiaang Duan; Shiyou Qian; Hanwen Hu; Dingyu Yang; Jian Cao; Guangtao Xue

doi:10.1145/3727125

ScienceGate Book Chapters

JOURNAL ARTICLE

PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms

Jiaang Duan Shiyou Qian Hanwen Hu Dingyu Yang Jian Cao Guangtao Xue

Year: 2025 Journal: Proceedings of the ACM on Measurement and Analysis of Computing Systems Vol: 9 (2)Pages: 1-23 Publisher: Association for Computing Machinery

DOI: 10.1145/3727125

Get Full-Text PDF Get Analytical Report

Abstract

The fusion of serverless computing and deep learning (DL) has led to serverless inference, offering a promising approach for developing and deploying scalable and cost-efficient deep learning inference services (DLISs). However, the challenge of cold start presents a significant obstacle for DLISs, where DL model size greatly impacts latency. Existing studies mitigate cold starts by extending keep-alive times, which unfortunately leads to decreased resource utilization efficiency. To address this issue, we introduce PipeCo, a system designed to alleviate DLIS cold start. The core concept of PipeCo is to achieve the miniaturization and pipelining of DLIS cold start. Firstly, PipeCo utilizes a vertical partitioning approach to divide each DLIS into multiple slices, prewarming slices in a sequential and overlapping manner to decrease the overall cold-start latency. Secondly, PipeCo employs an attention-based prediction mechanism to estimate periodic patterns in requests and idle containers for scheduling slices. Thirdly, PipeCo incorporates a similarity-based container matcher for the reuse of idle containers. We implemented a prototype of PipeCo on the OpenFaaS platform and conducted extensive experiments using three real-world DLIS repositories. The results demonstrate that PipeCo effectively decreases end-to-end (E2E) latency by up to 62.67% on CPU and 58.81% on GPU clusters and reduces the overall resource usage by 65.31% compared to five state-of-the-art baselines.

Keywords:

Inference Computer science Deep learning Cold start (automotive) Artificial intelligence Computer architecture Engineering

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.19

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Data Storage Technologies

Physical Sciences → Computer Science → Computer Networks and Communications

Stock Market Forecasting Methods

Social Sciences → Decision Sciences → Management Science and Operations Research

Traffic Prediction and Management Techniques

Physical Sciences → Engineering → Building and Construction

PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms

Abstract

Metrics

Topics

Related Documents

PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms

PipeCo: Pipelining Cold Start of Deep Learning Inference Services on Serverless Platforms

Empirical Evaluation of Cold Start Latency in Serverless Platforms

Cold Start Prediction and Provisioning Optimization in Serverless Computing Using Deep Learning

Pre-warming: Alleviating Cold Start Occurrences on Cloud-based Serverless Platforms