Semantic Understanding of Urban Street Scenes

Method Details

Details for method 'FarSee-Net'

Method overview

name	FarSee-Net
challenge	pixel-level semantic labeling
details	FarSee-Net: Real-Time Semantic Segmentation by Efﬁcient Multi-scale Context Aggregation and Feature Space Super-resolution Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of semantic segmentation is to deal with the objectscalevariationsandleveragethecontext.Howtoperform multi-scale context aggregation within limited computation budget is important. In this paper, ﬁrstly, we introduce a novel and efﬁcient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efﬁciently leverage context information. On the other hand, for runtime efﬁciency, state-of-the-art methods will quickly decrease the spatial size of the inputs or feature maps in the early network stages. The ﬁnal high-resolution result is usuallyobtainedbynon-parametricup-samplingoperation(e.g. bilinear interpolation). Differently, we rethink this pipeline and treat it as a super-resolution process. We use optimized superresolution operation in the up-sampling step and improve the accuracy, especially in sub-sampled input image scenario for real-time applications. By fusing the above two improvements, our methods provide better latency-accuracy trade-off than the other state-of-the-art methods. In particular, we achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card. The proposed module can be plugged into any feature extraction CNN and beneﬁts from the CNN structure development.
publication	FarSee-Net: Real-Time Semantic Segmentation by Efﬁcient Multi-scale Context Aggregation and Feature Space Super-resolution Zhanpeng Zhang and Kaipeng Zhang IEEE International Conference on Robotics and Automation (ICRA) 2020
project page / code
used Cityscapes data	fine annotations
used external data	ImageNet
runtime	0.0119 s Nivida Titan X (Maxwell)
subsampling	2
submission date	September, 2019
previous submissions

Average results

Metric	Value
IoU Classes	68.3661
iIoU Classes	39.3392
IoU Categories	85.9333
iIoU Categories	69.7428

Class results

Class	IoU	iIoU
road	97.9262	-
sidewalk	81.4137	-
building	89.8609	-
wall	38.6137	-
fence	43.5661	-
pole	53.1829	-
traffic light	58.8496	-
traffic sign	64.3553	-
vegetation	91.0366	-
terrain	67.7472	-
sky	94.0262	-
person	75.9136	53.9215
rider	57.292	29.017
car	93.2387	86.0932
truck	55.876	19.2969
bus	67.7575	30.1973
train	55.0709	28.1058
motorcycle	49.3623	23.3532
bicycle	63.8673	44.7284

Category results

Category	IoU	iIoU
flat	98.1201	-
nature	90.5937	-
object	60.1353	-
sky	94.0262	-
construction	89.863	-
human	76.546	55.8137
vehicle	92.2484	83.6719

Links

Download results as .csv file