Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Axial-DeepLab-XL [Mapillary Vistas]'

Method overview

name	Axial-DeepLab-XL [Mapillary Vistas]
challenge	panoptic semantic labeling
details	Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
publication	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen ECCV 2020 (spotlight) https://arxiv.org/abs/2003.07853
project page / code	https://github.com/csrhddlam/axial-deeplab
used Cityscapes data	fine annotations
used external data	ImageNet, Mapillary Vistas
runtime	n/a
subsampling	no
submission date	April, 2020
previous submissions

Average results

Metric	All	Things	Stuff
PQ	66.5747	58.7294	72.2803
SQ	83.4589	81.3324	85.0055
RQ	78.9689	72.0244	84.0195

Class results

Class	PQ	SQ	RQ
road	98.7233	98.7558	99.967
sidewalk	79.7859	86.2808	92.4724
building	90.3012	91.8719	98.2903
wall	47.5893	78.4082	60.6943
fence	50.0232	78.7827	63.4951
pole	67.5514	73.8766	91.4381
traffic light	58.8692	79.2751	74.2594
traffic sign	73.3658	82.9527	88.443
vegetation	91.4788	92.2023	99.2153
terrain	46.3842	79.1589	58.5963
sky	91.0115	93.4955	97.3432
person	57.1917	78.4114	72.938
rider	54.9082	74.9161	73.2929
car	69.8002	85.4688	81.6674
truck	57.2699	87.6094	65.3697
bus	68.9942	88.4423	78.0105
train	62.6624	85.6386	73.1707
motorcycle	52.4558	76.8973	68.2154
bicycle	46.5525	73.2755	63.5308

Links

Download results as .csv file