SEMANTIC LABELING OF STRUCTURAL ELEMENTS IN BUILDINGS BY FUSING RGB AND DEPTH IMAGES IN AN ENCODER-DECODER CNN FRAMEWORK

Author:

Iwaszczuk D.ORCID,Koppanyi Z.,Gard N. A.,Zha B.,Toth C.,Yilmaz A.

Abstract

Abstract. In the last decade, we have observed an increasing demand for indoor scene modeling in various applications, such as mobility inside buildings, emergency and rescue operations, and maintenance. Automatically distinguishing between structural elements of buildings, such as walls, ceilings, floors, windows, doors etc., and typical objects in buildings, such as chairs, tables and shelves, is particularly important for many reasons, such as 3D building modeling or navigation. This information can be generally retrieved through semantic labeling. In the past few years, convolutional neural networks (CNN) have become the preferred method for semantic labeling. Furthermore, there is ongoing research on fusing RGB and depth images in CNN frameworks. For pixel-level labeling, encoder-decoder CNN frameworks have been shown to be the most effective. In this study, we adopt an encoder-decoder CNN architecture to label structural elements in buildings and investigate the influence of using depth information on the detection of typical objects in buildings. For this purpose, we have introduced an approach to combine depth map with RGB images by changing the color space of the original image to HSV and then substitute the V channel with the depth information (D) and use it utilize it in the CNN architecture. As further variation of this approach, we also transform back the HSD images to RGB color space and use them within the CNN. This approach allows for using a CNN, designed for three-channel image input, and directly comparing our results with RGB-based labeling within the same network. We perform our tests using the Stanford 2D-3D-Semantics Dataset (2D-3D-S), a widely used indoor dataset. Furthermore, we compare our approach with results when using four-channel input created by stacking RGB and depth (RGBD). Our investigation shows that fusing RGB and depth improves results on semantic labeling; particularly, on structural elements of buildings. On the 2D- 3D-S dataset, we achieve up to 92.1 % global accuracy, compared to 90.9 % using RGB only and 93.6 % using RGBD. Moreover, the scores of Intersection over Union metric have improved using depth, which shows that it gives better labeling results at the boundaries.

Publisher

Copernicus GmbH

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Hybrid self-supervised learning-based architecture for construction progress monitoring;Automation in Construction;2024-02

2. Exploring fusion techniques in U-Net and DeepLab V3 architectures for multi-modal land cover classification;Earth Resources and Environmental Remote Sensing/GIS Applications XIII;2022-10-26

3. Computer vision-based construction progress monitoring;Automation in Construction;2022-06

4. CNN-Based Obstacle Avoidance Using RGB-Depth Image Fusion;Lecture Notes in Electrical Engineering;2021-07-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3