This study introduces an efficacious approach, Masked Collaborative Contrast (MCC), to emphasize semantic regions in weakly supervised semantic segmentation. MCC adroitly incorporates concepts from masked image modeling and contrastive learning to devise Transformer blocks that induce keys to contract towards semantically pertinent regions. Unlike prevalent techniques that directly eradicate patch regions in the input image when generating masks, we scrutinize the neighborhood relations of patch tokens by exploring masks considering keys on the affinity matrix. Moreover, we generate positive and negative samples in contrastive learning by utilizing the masked local output and contrasting it with the global output. Elaborate experiments on commonly employed datasets evidences that the proposed MCC mechanism effectively aligns global and local perspectives within the image, attaining impressive performance.
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar –xvf VOCtrainval_11-May-2012.tarThe augmented annotations can be downloaded from SBD dataset. After downloading SegmentationClassAug.zip, you should unzip it and move it to VOCdevkit/VOC2012.
VOCdevkit/
└── VOC2012
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
├── SegmentationClassAug
└── SegmentationObjectwget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zipUnzip and place train and validation images under VOC directory structure style.
MSCOCO/
├── annotations
├── JPEGImages
│ ├── train2014
│ └── val2014
└── SegmentationClass
├── train2014
└── val2014To generate VOC style segmentation labels for COCO dataset, use parse_coco.py.
python ./datasets/parse_coco.py --split train --year 2014 --to-voc12 false --coco-path $coco_path
python ./datasets/parse_coco.py --split val --year 2014 --to-voc12 false --coco-path $coco_pathgit clone https://github.com/fwu11/mcc.git
cd mccconda create -n py38 python==3.8
conda activate py38
conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirement.txtTo use the regularized loss, download and compile the python extension, see Here.
ln -s $your_dataset_path/VOCdevkit VOCdevkit
ln -s $your_dataset_path/MSCOCO MSCOCO## for VOC
CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node=4 --master_port=29501 scripts/dist_train_voc_seg_neg.py --work_dir work_dir_voc --spg 1
## for COCO
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 scripts/dist_train_coco_seg_neg.py --work_dir work_dir_coco --spg 1## for VOC
python tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer val
## for COCO
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python -m torch.distributed.launch --nproc_per_node=8 --master_port=29501 tools/infer_seg_voc.py --model_path $model_path --backbone vit_base_patch16_224 --infer valHere we report the performance on VOC and COCO dataset. MS+CRF denotes multi-scale test and CRF processing.
| Dataset | Backbone | val | Log | Weights | val (with MS+CRF) | test (with MS+CRF) |
|---|---|---|---|---|---|---|
| VOC | DeiT-B | 68.8 | log | weights | 70.3 | 71.2 |
| COCO | DeiT-B | 41.1 | log | weights | 42.3 | -- |
@inproceedings{wu2024masked,
title={Masked Collaborative Contrast for Weakly Supervised Semantic Segmentation},
author={Wu, Fangwen and He, Jingxuan and Yin, Yufei and Hao, Yanbin and Huang, Gang and Cheng, Lechao},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={862--871},
year={2024}
}Our code is developed based on ToCo. Also, we use the Regularized Loss and DenseCRF. We appreciate their great work.
