Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'Naive-Student (iterative semi-supervised learning with Panoptic-DeepLab)'

Method overview

name	Naive-Student (iterative semi-supervised learning with Panoptic-DeepLab)
challenge	instance-level semantic labeling
details	Supervised learning in large discriminative models is a mainstay for modern computer vision. Such an approach necessitates investing in large-scale human-annotated datasets for achieving state-of-the-art results. In turn, the efficacy of supervised learning may be limited by the size of the human annotated dataset. This limitation is particularly notable for image segmentation tasks, where the expense of human annotation is especially large, yet large amounts of unlabeled data may exist. In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences to improve the performance on urban scene segmentation, simultaneously tackling semantic, instance, and panoptic segmentation. The goal of this work is to avoid the construction of sophisticated, learned architectures specific to label propagation (e.g., patch matching and optical flow). Instead, we simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data. The procedure is iterated for several times. As a result, our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks, reaching the performance of 67.8% PQ, 42.6% AP, and 85.2% mIOU on the test set. We view this work as a notable step towards building a simple procedure to harness unlabeled video sequences to surpass state-of-the-art performance on core computer vision tasks.
publication	Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation Liang-Chieh Chen, Raphael Gontijo Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens https://arxiv.org/abs/2005.10266
project page / code
used Cityscapes data	fine annotations, video
used external data	ImageNet, Mapillary Vistas Research Edition. Cityscapes train-extra set (coarse labels are not used but only images).
runtime	n/a
subsampling	no
submission date	April, 2020
previous submissions

Average results

Metric	Value
AP	42.5605
AP50%	67.6073
AP100m	57.9371
AP50m	59.7797

Class results

Class	AP	AP50%	AP100m	AP50m
person	40.4725	71.8091	59.8944	60.0752
rider	35.3419	69.0848	50.807	51.3686
car	59.96	82.9472	80.1792	82.5748
truck	44.6579	56.5959	59.4973	65.6651
bus	53.4191	68.81	73.2215	79.6308
train	44.1044	65.6381	55.6078	53.7842
motorcycle	35.8067	66.2296	45.0608	45.9902
bicycle	26.7219	59.7432	39.2291	39.1488

Links

Download results as .csv file