Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Axial-DeepLab-L [Mapillary Vistas]'

Method overview

name	Axial-DeepLab-L [Mapillary Vistas]
challenge	panoptic semantic labeling
details	Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
publication	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen ECCV 2020 (spotlight) https://arxiv.org/abs/2003.07853
project page / code	https://github.com/csrhddlam/axial-deeplab
used Cityscapes data	fine annotations
used external data	ImageNet, Mapillary Vistas
runtime	n/a
subsampling	no
submission date	March, 2020
previous submissions

Average results

Metric	All	Things	Stuff
PQ	65.5536	56.8596	71.8765
SQ	83.021	80.9801	84.5053
RQ	78.1363	70.0513	84.0163

Class results

Class	PQ	SQ	RQ
road	98.6077	98.7379	99.8682
sidewalk	79.5288	85.7123	92.7857
building	89.8975	91.6782	98.0576
wall	44.058	78.1027	56.4103
fence	47.5243	77.8702	61.0301
pole	66.8175	72.8537	91.7146
traffic light	59.6734	78.0383	76.4668
traffic sign	72.8981	82.0785	88.8151
vegetation	91.2771	92.0294	99.1826
terrain	49.4441	78.9204	62.6506
sky	90.9151	93.5363	97.1976
person	55.8872	78.0042	71.6465
rider	52.8714	74.309	71.1507
car	68.8168	85.3298	80.648
truck	55.203	87.6557	62.9771
bus	64.4532	88.8313	72.5569
train	62.2083	84.8774	73.2919
motorcycle	50.5958	76.0528	66.5272
bicycle	44.8412	72.7803	61.6118

Links

Download results as .csv file