JOURNAL ARTICLE

CCBERT: Self-Supervised Code Change Representation Learning

Abstract

Numerous code changes are made by developers in their daily work, and a superior representation of code changes is desired for effective code change analysis. Recently, Hoang et al. proposed CC2Vec, a neural network-based approach that learns a distributed representation of code changes to capture the semantic intent of the changes. Despite demonstrated effectiveness in multiple tasks, CC2Vec has several limitations: 1) it considers only coarse-grained information about code changes, and 2) it relies on log messages rather than the self-contained content of the code changes. In this work, we propose CCBERT (Code Change BERT), a new Transformer-based pre-trained model that learns a generic representation of code changes based on a large-scale dataset containing massive unlabeled code changes. CCBERT is pre-trained on four proposed self-supervised objectives that are specialized for learning code change representations based on the contents of code changes. CCBERT perceives fine-grained code changes at the token level by learning from the old and new versions of the content, along with the edit actions. Our experiments demonstrate that CCBERT significantly outperforms CC2Vec or the state-of-the-art approaches of the downstream tasks by 7.7%–14.0% in terms of different metrics and tasks. CCBERT consistently outperforms large pre-trained code models, such as CodeBERT, while requiring 6–10× less training time, 5–30× less inference time, and 7.9× less GPU memory.

Keywords:
Computer science Code (set theory) Artificial intelligence Inference Representation (politics) Security token Code review Machine learning Source code Natural language processing Programming language Static program analysis Software Software development

Metrics

9
Cited By
5.57
FWCI (Field Weighted Citation Impact)
61
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Software Engineering Research
Physical Sciences →  Computer Science →  Information Systems
Advanced Malware Detection Techniques
Physical Sciences →  Computer Science →  Signal Processing
Software System Performance and Reliability
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Self-Supervised Code Change Representation Learning

Anonymous

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Self-Distilled Self-supervised Representation Learning

Jiho JangSeonhoon KimKiYoon YooChaerin KongJangho KimNojun Kwak

Journal:   2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Year: 2023 Pages: 2828-2838
JOURNAL ARTICLE

Self-supervised Hypergraph Representation Learning

Boxin DuChanghe YuanRobert A. BartonTal NeimanHanghang Tong

Journal:   2022 IEEE International Conference on Big Data (Big Data) Year: 2022 Pages: 505-514
© 2026 ScienceGate Book Chapters — All rights reserved.