Building footprints are an essential requirement for a variety of far-reaching tasks like creation of 3D models, city mapping, urban planning, change detection and population density estimation. This task if done manually proves to be time-intensive as it is not practical to manually delineate each building footprint outline. As the computational capability of computers increased with time, it provided scope for the advancements in the field of deep learning, leading to its application in diverse domains like medical image analysis to remote sensing. Deep convolutional neural networks now are capable of pixel-level classification of an input image, making it possible to extract particular features from images using the technique termed as semantic segmentation. Encoder-decoder based deep learning architectures have been used successfully to automatically extract building footprints from satellite images. In this paper, we showcase how Unet, an encoder-decoder based deep learning architecture can be coupled with well-known deep convolutional network architectures like VGG, Resnet and Inception to achieve stellar results (0.91 IoU) on high resolution optical satellite data obtained from Worldview - 3 satellite for the task of automatic extraction of building footprints.
Kriti RastogiPankaj BodaniShashikant A. Sharma
Hoshang J. KhdirHaval AbdulJabbar Sadeq
Shaker F. AhmedAdel H. EL-ShazelyWael Ahmed
Jian KangRubén Fernández-BeltránXian SunJingen NiAntonio Plaza