Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'PAC: Perspective-adaptive Convolutions'

Method overview

name	PAC: Perspective-adaptive Convolutions
challenge	pixel-level semantic labeling
details	Many existing scene parsing methods adopt Convolutional Neural Networks with receptive fields of fixed sizes and shapes, which frequently results in inconsistent predictions of large objects and invisibility of small objects. To tackle this issue, we propose perspective-adaptive convolutions to acquire receptive fields of flexible sizes and shapes during scene parsing. Through adding a new perspective regression layer, we can dynamically infer the position-adaptive perspective coefficient vectors utilized to reshape the convolutional patches. Consequently, the receptive fields can be adjusted automatically according to the various sizes and perspective deformations of the objects in scene images. Our proposed convolutions are differentiable to learn the convolutional parameters and perspective coefficients in an end-to-end way without any extra training supervision of object sizes. Furthermore, considering that the standard convolutions lack contextual information and spatial dependencies, we propose a context adaptive bias to capture both local and global contextual information through average pooling on the local feature patches and global feature maps, followed by flexible attentive summing to the convolutional results. The attentive weights are position-adaptive and context-aware, and can be learned through adding an additional context regression layer. Experiments on Cityscapes and ADE20K datasets well demonstrate the effectiveness of the proposed methods.
publication	Perspective-adaptive Convolutions for Scene Parsing Rui Zhang, Sheng Tang, Yongdong Zhang, Jintao Li, and Shuicheng Yan IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) https://ieeexplore.ieee.org/document/8598804
project page / code
used Cityscapes data	fine annotations
used external data	ImageNet
runtime	n/a
subsampling	no
submission date	March, 2018
previous submissions

Average results

Metric	Value
IoU Classes	78.8983
iIoU Classes	55.6839
IoU Categories	90.6883
iIoU Categories	78.3441

Class results

Class	IoU	iIoU
road	98.7114	-
sidewalk	86.9318	-
building	93.3459	-
wall	58.8669	-
fence	60.3572	-
pole	65.7715	-
traffic light	73.0131	-
traffic sign	78.335	-
vegetation	93.5518	-
terrain	72.8317	-
sky	95.6082	-
person	85.9924	67.1502
rider	71.2982	48.5209
car	95.977	90.3585
truck	73.396	42.328
bus	82.3653	51.6824
train	69.5079	42.1912
motorcycle	67.2574	43.236
bicycle	75.9494	60.0042

Category results

Category	IoU	iIoU
flat	98.7076	-
nature	93.2267	-
object	72.2191	-
sky	95.6082	-
construction	93.5416	-
human	86.1319	68.3796
vehicle	95.3833	88.3086

Links

Download results as .csv file