Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Vision Transformer Adapter for Dense Predictions'

Method overview

name	Vision Transformer Adapter for Dense Predictions
challenge	pixel-level semantic labeling
details	ViT-Adapter-L, BEiT pre-train, multi-scale testing
publication	Vision Transformer Adapter for Dense Predictions Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao https://arxiv.org/abs/2205.08534
project page / code	https://github.com/czczup/ViT-Adapter
used Cityscapes data	fine annotations
used external data	ImageNet, Mapillary
runtime	n/a
subsampling	no
submission date	May, 2022
previous submissions

Average results

Metric	Value
IoU Classes	85.2055
iIoU Classes	68.2575
IoU Categories	92.8165
iIoU Categories	83.4203

Class results

Class	IoU	iIoU
road	98.8827	-
sidewalk	88.4947	-
building	94.4966	-
wall	66.745	-
fence	70.2345	-
pole	74.5308	-
traffic light	80.1767	-
traffic sign	83.5873	-
vegetation	94.3988	-
terrain	73.7165	-
sky	96.1833	-
person	89.674	75.8518
rider	79.0404	58.9152
car	96.6793	91.497
truck	85.5161	54.622
bus	94.4201	68.165
train	90.4794	66.7375
motorcycle	79.8667	60.3947
bicycle	81.7815	69.8768

Category results

Category	IoU	iIoU
flat	98.7941	-
nature	94.0184	-
object	79.6169	-
sky	96.1833	-
construction	94.8586	-
human	89.8844	76.6379
vehicle	96.3597	90.2027

Links

Download results as .csv file