Describe the house and I will tell you the price: House price prediction with textual description data

Author:

Zhang Hanxiang,Li Yansong,Branco PaulaORCID

Abstract

Abstract House price prediction is an important problem that could benefit home buyers and sellers. Traditional models for house price prediction use numerical attributes such as the number of rooms but disregard the house description text. The recent developments in text processing suggest these can be valuable attributes, which motivated us to use house descriptions. This paper focuses on the house asking/advertising price and studies the impact of using house description texts to predict the final house price. To achieve this, we collected a large and diverse set of attributes on house postings, including the house advertising price. Then, we compare the performance of three scenarios: using only the house description, only numeric attributes, or both. We processed the description text through three word embedding techniques: TF-IDF, Word2Vec, and BERT. Four regression algorithms are trained using only textual data, non-textual data, or both. Our results show that by using exclusively the description data with Word2Vec and a Deep Learning model, we can achieve good performance. However, the best overall performance is obtained when using both textual and non-textual features. An $R^2$ of 0.7904 is achieved by the deep learning model using only description data on the testing data. This clearly indicates that using the house description text alone is a strong predictor for the house price. However, when observing the RMSE on the test data, the best model was gradient boosting using both numeric and description data. Overall, we observe that combining the textual and non-textual features improves the learned model and provides performance benefits when compared against using only one of the feature types. We also provide a freely available application for house price prediction, which is solely based on a house text description and uses our final developed model with Word2Vec and Deep Learning to predict the house price.

Publisher

Cambridge University Press (CUP)

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software

Reference33 articles.

1. Rehurek, R. and Sojka, P. (2010). Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Citeseer.

2. An Intelligent System for Identifying Influential Words in Real-Estate Classifieds

3. Greedy function approximation: A gradient boosting machine.

4. A suggestion for using powerful and informative tests of normality;D’agostino;The American Statistician,1990

5. Chen, X. , Wei, L. and Xu, J. (2017). House price prediction using LSTM. arXiv preprint arXiv:1709.08432.

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Scalable multimodal assessment of the micro-neighborhood using orthogonal visual inputs;Journal of Housing and the Built Environment;2024-08-19

2. Enhancing Zillow Zestimates: Leveraging Machine-Learning for Precise Property Valuation Predictions;2024 IEEE Students Conference on Engineering and Systems (SCES);2024-06-21

3. Identifying the Current Status of Real Estate Appraisal Methods;Real Estate Management and Valuation;2024-06-07

4. Housing Price Prediction Using Machine Learning Techniques;2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG);2024-04-02

5. Housing Rental Information Management and Prediction System Based on CatBoost Algorithm - a Case Study of Halifax Region;Lecture Notes in Computer Science;2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3