Method Details


Details for method 'Axial-DeepLab-L [Mapillary Vistas]'

 

Method overview

name Axial-DeepLab-L [Mapillary Vistas]
challenge panoptic semantic labeling
details Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
publication Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
ECCV 2020 (spotlight)
https://arxiv.org/abs/2003.07853
project page / code https://github.com/csrhddlam/axial-deeplab
used Cityscapes data fine annotations
used external data ImageNet, Mapillary Vistas
runtime n/a
subsampling no
submission date March, 2020
previous submissions

 

Average results

Metric AllThingsStuff
PQ 65.5536 56.8596 71.8765
SQ 83.021 80.9801 84.5053
RQ 78.1363 70.0513 84.0163

 

Class results

Class PQ SQ RQ
road 98.6077 98.7379 99.8682
sidewalk 79.5288 85.7123 92.7857
building 89.8975 91.6782 98.0576
wall 44.058 78.1027 56.4103
fence 47.5243 77.8702 61.0301
pole 66.8175 72.8537 91.7146
traffic light 59.6734 78.0383 76.4668
traffic sign 72.8981 82.0785 88.8151
vegetation 91.2771 92.0294 99.1826
terrain 49.4441 78.9204 62.6506
sky 90.9151 93.5363 97.1976
person 55.8872 78.0042 71.6465
rider 52.8714 74.309 71.1507
car 68.8168 85.3298 80.648
truck 55.203 87.6557 62.9771
bus 64.4532 88.8313 72.5569
train 62.2083 84.8774 73.2919
motorcycle 50.5958 76.0528 66.5272
bicycle 44.8412 72.7803 61.6118

 

Links

Download results as .csv file

Benchmark page