JOURNAL ARTICLE

DTCC: Multi-level dilated convolution with transformer for weakly-supervised crowd counting

Zhuangzhuang MiaoYong ZhangPeng YuanHaocheng PengBaocai Yin

Year: 2023 Journal:   Computational Visual Media Vol: 9 (4)Pages: 859-873   Publisher: Springer Nature

Abstract

Abstract Crowd counting provides an important foundation for public security and urban management. Due to the existence of small targets and large density variations in crowd images, crowd counting is a challenging task. Mainstream methods usually apply convolution neural networks (CNNs) to regress a density map, which requires annotations of individual persons and counts. Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images, but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored. We propose a weakly-supervised method, DTCC, which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting. Its main components include a recursive swin transformer and a multi-level dilated convolution regression head. The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features, including global features. The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module. This module can capture both low- and high-level features simultaneously to enhance the receptive field. In addition, two regression head fusion mechanisms realize dynamic and mean fusion counting. Experiments on four well-known benchmark crowd counting datasets (UCF_CC_50, ShanghaiTech, UCF_QNRF, and JHU-Crowd++) show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.

Keywords:
Computer science Artificial intelligence Feature extraction Pattern recognition (psychology) Convolution (computer science) Regression Computer vision Artificial neural network Mathematics Statistics

Metrics

10
Cited By
1.82
FWCI (Field Weighted Citation Impact)
51
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Anomaly Detection Techniques and Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Fire Detection and Safety Systems
Physical Sciences →  Engineering →  Safety, Risk, Reliability and Quality

Related Documents

JOURNAL ARTICLE

Weakly supervised crowd counting with joint CNN and transformer network

Fusen WangKai LiuNing WeiNong SangXiaofeng XiaJun Sang

Journal:   Pattern Recognition Year: 2025 Vol: 171 Pages: 112077-112077
JOURNAL ARTICLE

Crowd Counting by Multi-Scale Dilated Convolution Networks

Jingwei DongZiqi ZhaoTongxin Wang

Journal:   Electronics Year: 2023 Vol: 12 (12)Pages: 2624-2624
JOURNAL ARTICLE

Multi-level Convolutional Transformer with Adaptive Ranking for Semi-supervised Crowd Counting

Xin DengSongjian ChenYifan ChenJie-Fang Xu

Journal:   2021 4th International Conference on Algorithms, Computing and Artificial Intelligence Year: 2021 Pages: 1-7
© 2026 ScienceGate Book Chapters — All rights reserved.