Method Details


Details for method 'Vision Transformer Adapter for Dense Predictions'

 

Method overview

name Vision Transformer Adapter for Dense Predictions
challenge pixel-level semantic labeling
details ViT-Adapter-L, BEiT pre-train, multi-scale testing
publication Vision Transformer Adapter for Dense Predictions
Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao
https://arxiv.org/abs/2205.08534
project page / code https://github.com/czczup/ViT-Adapter
used Cityscapes data fine annotations
used external data ImageNet, Mapillary
runtime n/a
subsampling no
submission date May, 2022
previous submissions

 

Average results

Metric Value
IoU Classes 85.2055
iIoU Classes 68.2575
IoU Categories 92.8165
iIoU Categories 83.4203

 

Class results

Class IoU iIoU
road 98.8827 -
sidewalk 88.4947 -
building 94.4966 -
wall 66.745 -
fence 70.2345 -
pole 74.5308 -
traffic light 80.1767 -
traffic sign 83.5873 -
vegetation 94.3988 -
terrain 73.7165 -
sky 96.1833 -
person 89.674 75.8518
rider 79.0404 58.9152
car 96.6793 91.497
truck 85.5161 54.622
bus 94.4201 68.165
train 90.4794 66.7375
motorcycle 79.8667 60.3947
bicycle 81.7815 69.8768

 

Category results

Category IoU iIoU
flat 98.7941 -
nature 94.0184 -
object 79.6169 -
sky 96.1833 -
construction 94.8586 -
human 89.8844 76.6379
vehicle 96.3597 90.2027

 

Links

Download results as .csv file

Benchmark page