Method Details
Details for method 'FarSee-Net'
Method overview
name | FarSee-Net |
challenge | pixel-level semantic labeling |
details | FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution Real-time semantic segmentation is desirable in many robotic applications with limited computation resources. One challenge of semantic segmentation is to deal with the objectscalevariationsandleveragethecontext.Howtoperform multi-scale context aggregation within limited computation budget is important. In this paper, firstly, we introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP). It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information. On the other hand, for runtime efficiency, state-of-the-art methods will quickly decrease the spatial size of the inputs or feature maps in the early network stages. The final high-resolution result is usuallyobtainedbynon-parametricup-samplingoperation(e.g. bilinear interpolation). Differently, we rethink this pipeline and treat it as a super-resolution process. We use optimized superresolution operation in the up-sampling step and improve the accuracy, especially in sub-sampled input image scenario for real-time applications. By fusing the above two improvements, our methods provide better latency-accuracy trade-off than the other state-of-the-art methods. In particular, we achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card. The proposed module can be plugged into any feature extraction CNN and benefits from the CNN structure development. |
publication | FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution Zhanpeng Zhang and Kaipeng Zhang IEEE International Conference on Robotics and Automation (ICRA) 2020 |
project page / code | |
used Cityscapes data | fine annotations |
used external data | ImageNet |
runtime | 0.0119 s Nivida Titan X (Maxwell) |
subsampling | 2 |
submission date | September, 2019 |
previous submissions |
Average results
Metric | Value |
---|---|
IoU Classes | 68.3661 |
iIoU Classes | 39.3392 |
IoU Categories | 85.9333 |
iIoU Categories | 69.7428 |
Class results
Class | IoU | iIoU |
---|---|---|
road | 97.9262 | - |
sidewalk | 81.4137 | - |
building | 89.8609 | - |
wall | 38.6137 | - |
fence | 43.5661 | - |
pole | 53.1829 | - |
traffic light | 58.8496 | - |
traffic sign | 64.3553 | - |
vegetation | 91.0366 | - |
terrain | 67.7472 | - |
sky | 94.0262 | - |
person | 75.9136 | 53.9215 |
rider | 57.292 | 29.017 |
car | 93.2387 | 86.0932 |
truck | 55.876 | 19.2969 |
bus | 67.7575 | 30.1973 |
train | 55.0709 | 28.1058 |
motorcycle | 49.3623 | 23.3532 |
bicycle | 63.8673 | 44.7284 |
Category results
Category | IoU | iIoU |
---|---|---|
flat | 98.1201 | - |
nature | 90.5937 | - |
object | 60.1353 | - |
sky | 94.0262 | - |
construction | 89.863 | - |
human | 76.546 | 55.8137 |
vehicle | 92.2484 | 83.6719 |