Method Details


Details for method 'NCTU-ITRI'

 

Method overview

name NCTU-ITRI
challenge pixel-level semantic labeling
details For the purpose of fast semantic segmentation, we design a CNN-based encoder-decoder architecture, which is called DSNet. The encoder part is constructed based on the concept of DenseNet, and a simple decoder is adopted to make the network more efficient without degrading the accuracy. We pre-train the encoder network on the ImageNet dataset. Then, only the fine-annotated Cityscapes dataset (2975 training images) is used to train the complete DSNet. The DSNet demonstrates a good trade-off between accuracy and speed. It can process 68 frames per second on 1024x512 resolution images on a single GTX 1080 Ti GPU.
publication Anonymous
project page / code
used Cityscapes data fine annotations
used external data ImageNet
runtime 0.0147 s
GTX 1080 Ti
subsampling 2
submission date July, 2018
previous submissions

 

Average results

Metric Value
IoU Classes 69.1013
iIoU Classes 41.4343
IoU Categories 86.7618
iIoU Categories 70.8417

 

Class results

Class IoU iIoU
road 97.9662 -
sidewalk 82.0702 -
building 90.3721 -
wall 42.3574 -
fence 45.8075 -
pole 56.3838 -
traffic light 61.1151 -
traffic sign 66.6427 -
vegetation 91.7103 -
terrain 70.0256 -
sky 94.2527 -
person 77.2477 56.8646
rider 59.1464 33.1569
car 93.2486 85.5851
truck 49.5487 19.2678
bus 59.3626 33.2648
train 56.3247 31.9048
motorcycle 53.5447 27.0271
bicycle 65.7977 44.4035

 

Category results

Category IoU iIoU
flat 98.2682 -
nature 91.4397 -
object 62.8912 -
sky 94.2527 -
construction 90.436 -
human 77.5916 58.3859
vehicle 92.453 83.2974

 

Links

Download results as .csv file

Benchmark page