北京建筑大学学报

2026, 01, v.42 137-148

细节增强引导的弱监督人群计数方法

1.北京建筑大学智能科学与技术学院 2.北京建筑大学城市建筑超级智能技术北京市重点试验室

基金项目(Foundation): 国家自然科学基金项目（62271035）; 北京市自然科学基金项目（4232021）

邮箱(Email):

DOI: 10.19740/j.2096-9872.2026.01.15

33	0	63
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

随着城镇化进程的加速，高密度人群场景的监控和分析需求激增。基于弱监督学习的人群计数方法仅使用计数级标签，近年来受到研究人员越来越多的关注。目前，弱监督人群计数仍然面临着人群尺度差异和复杂背景干扰的挑战。为解决这些问题，提出一种细节增强引导的弱监督人群计数方法。该方法在特征提取阶段分为2个分支。主干分支选用ResNet网络和Transformer的融合网络，进行局部特征和全局特征的提取；辅助分支设计了细节增强模块，旨在引导和强化图像中的细节信息。随后，通过注意力加权融合模块将2个分支输出的特征进行有效融合，融合后的特征进入计数回归模块进行人群计数。在多个标准人群计数数据集上进行了试验，试验结果表明该方法可以达到高精度的人群计数。

关键词： 人群计数; CNN; Transformer; 弱监督学习; 注意力机制;

Abstract：

With the acceleration of the urbanization, there has been a surge in demand for monitoring and analysis of high-density crowd scenarios. Crowd counting methods based on weakly-supervised learning, which only rely on count-level labels, have attracted increasing attention in recent years. At present, weakly-supervised crowd counting still faces the challenges of crowd scale differences and complex background interference. In order to solve these problems, we propose a weakly-supervised crowd counting method guided by detail enhancement. Specifically, the method is divided into two branches in the feature extraction stage. The main branch integrates ResNet network with Transformer to extract local features and global features. The auxiliary branch includes a detail-enhanced module, which aims to guide and strengthen the detail information in the image. Then, the features extracted by the two branches are effectively integrated by the adaptive fusion module, then the integrated features are passed into the counting regression module for crowd counting. Experiments on several standard crowd counting datasets are also conducted, and the results show that the proposed method achieves excellent counting performance.

KeyWords： crowd counting; CNN; Transformer; weakly-supervised; attention mechanism;

参考文献

[1]TUZEL O, PORIKLI F, MEER P. Pedestrian detection via classification on Riemannian manifolds[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 30(10):1713-1727.

[2]张德，樊昊铭，杨伟平.基于姿态引导注意力机制的跨域行人重识别[J].北京建筑大学学报，2024, 40(1):68-74.ZHANG D, FAN H M, YANG W P. Cross-domain person re-identification based on pose-guided attention mechanism[J]. Journal of Beijing University of Civil Engineering and Architecture, 2024, 40(1):68-74.(in Chinese)

[3]CHAN A B, VASCONCELOS N. Counting people with low-level features and Bayesian regression[J]. IEEE Transactions on Image Processing, 2011, 21(4):2160-2177.

[4]覃勋辉，王修飞，周曦，等.多种人群密度场景下的人群计数[J].中国图象图形学报，2013, 18(04):392-398.QIN X H, WANG X F, ZHOU X. Counting people in various crowed density scenes using support vector regression[J]. Journal of Image and Graphics, 2013, 18(04):392-398.(in Chinese)

[5]潘增滢，吴瑞姣，林易丰，等.改进的残差式3D-CNN和近邻注意力的高光谱遥感图像分类[J].自然资源遥感，2025, 32(02):101-112.PAN Z Y, WU R J, LIN Y F, et al. Hyperspectral remote sensing image classification using improved residual 3D-CNN and neighborhood attention[J].Remote Sensing for Natural Resources, 2025, 32(02):101-112.(in Chinese)

[6]陈冬，句彦伟.基于语义分割实现的SAR图像舰船目标检测[J].系统工程与电子技术，2022, 44(04):1195-1201.CHEN D, JU Y W. Ship object detection SAR images based on semantic segmentation[J]. Systems Engineering and Electronics, 2022, 44(04):1195-1201.(in Chinese)

[7]支慧芳，韩建新，吴永飞.融合注意力与上下文信息的皮肤癌图像分割模型[J].计算机工程与设计，2024, 45(9):87-96.ZHI H F, HAN J X, WU Y F. Skin cancer image segmentation via combining attention and context information[J]. Computer Engineering and Design,2024, 45(9):87-96.(in Chinese)

[8]FAN Z Z, ZHANG H, ZHANG Z, et al. A survey of crowd counting and density estimation based on convolutional neural network[J]. Neurocomputing,2022, 472:224-251.

[9]陈思秦.基于全卷积神经网络的人群计数[J].电子设计工程，2018, 26(2):75-79.CHEN S Q. Crowd counting based on fully convolutional neural network[J]. Electronic Design Engineering,2018, 26(2):75-79.(in Chinese)

[10]KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classi fi cation with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(06):84-90.

[11]唐斯琪，陶蔚，张梁梁，等.一种多列特征图融合的深度人群计数算法[J].郑州大学学报（理学版），2018, 50(2):69-74.TANG S Q, TAO W, ZHANG L L, et al. A deep crowd counting algorithm based on multi-column feature map fusion[J]. Journal of Zhengzhou University(Natural Science Edition), 2018, 50(2):69-74.(in Chinese)

[12]翟强，王陆洋，殷保群，等.基于尺度自适应卷积神经网络的人群计数算法[J].计算机工程，2020, 46(2):250-254+261.ZHAI Q, WANG L Y, YIN B Q, et al. Crowd counting algorithm based on scale adaptive convolutional neural network[J]. Computer Engineering, 2020, 46(2):250-254+261.(in Chinese)

[13]ZHAI W Z, GAO M L, SOURI A, et al. An attentive hierarchy ConvNet for crowd counting in smart city[J].Cluster Computing, 2023, 26:1099-1111.

[14]ALDHAHERI S, ALOTAIBI R, ALZAHRANI B, et al. MACC Net:Multi-task attention crowd counting network[J]. Applied Intelligence, 2023, 53:9285–9297.

[15]WANG F, SANG J, WU Z, et al. Hybrid attention network based on progressive embedding scale-context for crowd counting[J]. Information Sciences, 2022,591:306-318.

[16]刘砚，雷印杰，宁芊.基于深度神经网络的弱监督密集场景人群计数算法[J].计算机科学，2020, 47(4):184-188.LIU Y, LEI Y J, NING Q. Study of crowd counting algorithm of weak supervision dense scene based on deep neural network[J]. Computer Science, 2020, 47(4):184-188.(in Chinese)

[17]LEI Y J, LIU Y, ZHANG P P, et al. Towards using count-level weak supervision for crowd counting[J].Pattern Recognition, 2021, 109:107616.

[18]LIANG D K, CHEN X W, XU W, et al. TransCrowd:Weakly-supervised crowd counting with transformers[J]. Science China Information Sciences, 2022, 65(6):160104.

[19]MIAO Z Z, ZHANG Y, PENG Y, et al. DTCC:Multilevel dilated convolution with transformer for weaklysupervised crowd counting[J]. Computational Visual Media, 2023, 9(4):859-873.

[20]TIAN Y, CHU X, WANG H. CCTrans:Simplifying and improving crowd counting with transformer[J].arXiv preprint arXiv:2109. 14483, 2021.

[21]SAVNER S S, KANHANGAD V. CrowdFormer:Weakly-supervised crowd counting with improved generalizability[J]. Journal of Visual Communication and Image Representation, 2023, 94:103853.

[22]GAO M, DENG M, ZHAO H, et al. Improving MLPbased weakly supervised crowd-counting network via scale reasoning and ranking[J]. Electronics, 2024, 13(3):471.

[23]WANG W H, XIE E, LI X, et al. PVT v2:Improved baselines with pyramid vision transformer[J].Computational Visual Media, 2022, 8(3):415-424.

[24]孙爽，何立风，朱纷，等.基于多分支特征融合的密集人群计数网络[J].计算机工程与设计，2024, 45(03):814-821.SUN S, HE L F, ZHU F, et al. Multi-branch feature fusion network for crowd counting[J]. Computer Engineering and Design, 2024, 45(03):814-821.(in Chinese)

[25]ZHANG L, YAN L L, ZHANG M Q, et al. T2CNN:A novel method for crowd counting via two-task convolutional neural network[J]. The Visual Computer,2023, 39(1):73-85.

[26]ZHOU L F, WANG P W, LI W, et al. Semanticrefined spatial pyramid network for crowd counting[J].Pattern Recognition Letters, 2022, 159:9-15.

[27]ZHAI W Z, LI Q L, ZHOU Y, et al. DA2Net:A dual attention-aware network for robust crowd counting[J].Multimedia Systems, 2023, 29(5):3027-3040.

基本信息:

DOI：10.19740/j.2096-9872.2026.01.15

中图分类号:TP391.41;TP18

引用信息:

[1]张德,蔡雨航.细节增强引导的弱监督人群计数方法[J].北京建筑大学学报,2026,42(01):137-148.DOI:10.19740/j.2096-9872.2026.01.15.

基金信息:

国家自然科学基金项目（62271035）; 北京市自然科学基金项目（4232021）

投稿时间：

2025-06-24

投稿日期（年）：

2025

终审时间：

2026-03-23

终审日期（年）：

2026

审稿周期（年）：

发布时间：

2026-02-25

出版时间：

2026-02-25

请选择需要下载的pdf数据

北京建筑大学学报

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

北京建筑大学学报

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈