Predicting Housing Prices in Istanbul Using Explainable Artificial Intelligence Techniques

Main Article Content

Hale Uysal Adnan Kalkan

Abstract

Accurate prediction of housing prices in dynamic markets like Istanbul is crucial for stakeholders in the real estate industry, yet traditional models often lack transparency and interpretability. This study addresses this gap by integrating artificial intelligence (AI) with explainable artificial intelligence (XAI) techniques to predict housing prices in the Istanbul housing market. Utilizing a comprehensive dataset containing 25,154 entries and 37 features from the Kaggle platform, we employed several machine learning models, including Random Forest Regressor, Linear Regression, KNeighbors Regressor, Decision Tree Regressor, Gradient Boosting Regressor, and Ridge Regressor. Rigorous data preprocessing steps—such as handling missing values, outlier detection, and encoding categorical variables—were meticulously performed to ensure data quality. The Random Forest model, optimized through hyperparameter tuning, achieved the highest performance with an R² score of 0.8683 on the test set. To enhance model interpretability, XAI methods like SHAP and LIME were utilized, revealing that gross square meters and location (specifically, districts like Kadıköy and Sarıyer) significantly impact housing prices. These findings align with existing literature and offer actionable insights for policymakers and industry professionals. This research underscores the importance of combining AI with XAI to develop transparent, reliable models, thereby advancing data-driven decision-making in the real estate sector.

Article Details

How to Cite
UYSAL, Hale; KALKAN, Adnan. Predicting Housing Prices in Istanbul Using Explainable Artificial Intelligence Techniques. Journal of Multidisciplinary Developments, [S.l.], v. 9, n. 1, p. 19-34, july 2024. ISSN 2564-6095. Available at: <http://jomude.com/index.php/jomude/article/view/114>. Date accessed: 12 mar. 2025.
Section
Natural Sciences - Regular Research Paper

References

Abidoye, R. and Chan, A. (2017). Valuers’ receptiveness to the application of artificial intelligence in property valuation. Pacific Rim Property Research Journal, 23(2), 175-193. https://doi.org/10.1080/14445921.2017.1299453
Amro, A., Al-Akhras, M., Hindi, K., Habib, M., & Shawar, B. (2021). Instance reduction for avoiding overfitting in decision trees. Journal of Intelligent Systems, 30(1), 438-459. https://doi.org/10.1515/jisys-2020-0061
Arslanlı, K. (2020). Analysis of house prices: a hedonic model proposal for istanbul metropolitan area. Journal of Design for Resilience in Architecture and Planning, 1(1), 57-68. https://doi.org/10.47818/drarch.2020.v1i1004
Asha, G. (2022). Linear regression analysis theory and computation. Quing International Journal of Innovative Research in Science and Engineering, 1(2), 39-57. https://doi.org/10.54368/qijirse.1.2.0002
Ayan, E. and Eken, S. (2021). Detection of price bubbles in istanbul housing market using lstm autoencoders: a district-based approach. Soft Computing, 25(12), 7957-7973. https://doi.org/10.1007/s00500-021-05677-6
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2020). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3), 1937-1967. https://doi.org/10.1007/s10462-020-09896-5
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research, 13(2).
Biau, G., Devroye, L., & Lugosi, G. (2008). Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research, 9(9).
Bourassa, S. C., Cantoni, E., & Hoesli, M. (2010). Predicting house prices with spatial dependence: A comparison of alternative methods. Journal of Real Estate Research, 32(2), 139-160.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32
Chai, T. and Draxler, R. (2014). Root mean square error (rmse) or mean absolute error (mae)? – arguments against avoiding rmse in the literature. Geoscientific Model Development, 7(3), 1247-1250. https://doi.org/10.5194/gmd-7-1247-2014
Chicco, D., Warrens, M., & Jurman, G. (2021). The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation. Peerj Computer Science, 7, e623. https://doi.org/10.7717/peerj-cs.623
Čeh, M., Kilibarda, M., Lisec, A., Bajat, B. (2018). Estimating the performance of random forest versus multiple regression for predicting prices of the apartments. Isprs International Journal of Geo-Information, 7(5), 168. https://doi.org/10.3390/ijgi7050168
Çilgin, C., Gökşen, Y., Gökçen, H. (2023). The effect of outlier detection methods in real estate valuation with machine learning. İzmir Sosyal Bilimler Dergisi, 5(1), 9-20. https://doi.org/10.47899/ijss.1270433
Jafary, P., Shojaei, D., Rajabifard, A., Ngo, T. (2022). A framework to integrate bim with artificial intelligence and machine learning-based property valuation methods. Isprs Annals of the Photogrammetry Remote Sensing and Spatial Information Sciences, X-4/W2-2022, 129-136. https://doi.org/10.5194/isprs-annals-x-4-w2-2022-129-2022
Hamid, A., Nawi, W., Lola, M., Mustafa, W., Malik, S., Zakaria, S., … & Abdullah, M. (2023). Improvement of time forecasting models using machine learning for future pandemic applications based on covid-19 data 2020–2022. Diagnostics, 13(6), 1121. https://doi.org/10.3390/diagnostics13061121
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: Springer.
He, T., Heidemeyer, M., Ban, F., Cherkasov, A., & Ester, M. (2017). Simboost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines. Journal of Cheminformatics, 9(1). https://doi.org/10.1186/s13321-017-0209-z
Hosen, S. and Amin, R. (2021). Significant of gradient boosting algorithm in data management system. Engineering International, 9(2), 85-100. https://doi.org/10.18034/ei.v9i2.559
Kauko, T. (2003). Residential property value and locational externalities: On the complementarity and substitutability of approaches. Journal of Property Investment & Finance, 21(3), 250-270.
Lee, C. and Park, K. (2020). Using photographs and metadata to estimate house prices in south korea. Data Technologies and Applications, 55(2), 280-292. https://doi.org/10.1108/dta-05-2020-0111
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.
Mienye, I. and Sun, Y. (2022). A survey of ensemble learning: concepts, algorithms, applications, and prospects. Ieee Access, 10, 99129-99149. https://doi.org/10.1109/access.2022.3207287
Molnar, C. (2022). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub.
Morali, O. and Yilmaz, N. (2020). Spatial heterogeneity in istanbul housing market: a geographically weighed approach. Pressacademia, 7(4), 298-307. https://doi.org/10.17261/pressacademia.2020.1304
Ng, S., Chew, Y., Ch’ng, P., & Ng, K. (2018). An insight of linear regression analysis. Scientific Research Journal, 15(2), 1. https://doi.org/10.24191/srj.v15i2.9347
Odunfa, V., Fateye, T., Adewusi, A. (2021). Application of artificial intelligence (ai) approach to african real estate market analysis opportunities and challenges. Advances in Multidisciplinary & Scientific Research Journal Publication, 29, 121-132. https://doi.org/10.22624/aims/abmic2021p9
Okolo, A. (2010). Transformation of independent variables in polynomial regression models. Global Journal of Mathematical Sciences, 8(1). https://doi.org/10.4314/gjmas.v8i1.50810
Prasad, P., Dubey, V., & Sharma, A. (2022). Surface roughness prediction of aisi 304 steel in nanofluid assisted turning using machine learning technique. Key Engineering Materials, 933, 13-24. https://doi.org/10.4028/p-wwb643
Rathnayake, N., Rathnayake, U., Dang, T., & Hoshino, Y. (2023). Water level prediction using soft computing techniques: a case study in the malwathu oya, sri lanka. Plos One, 18(4), e0282847. https://doi.org/10.1371/journal.pone.0282847
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?" Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135-1144.
Sarip, A. and Hafez, M. (2015). Fuzzy logic application for house price prediction. International Journal of Property Sciences, 5(1), 1-7. https://doi.org/10.22452/ijps.vol5no1.3
Sarip, A., Hafez, M., Daud, M. (2016). Application of fuzzy regression model for real estate price prediction. Malaysian Journal of Computer Science, 29(1), 15-27. https://doi.org/10.22452/mjcs.vol29no1.2
Saravanakumar, D., Ananthi, N., & Devi, M. (2013). An approach to automation selection of decision tree based on training data set. International Journal of Computer Applications, 64(21), 1-4. https://doi.org/10.5120/10755-5500
Seagraves, P. (2023). Real estate insights: is the ai revolution a real estate boon or bane?. Journal of Property Investment & Finance, 42(2), 190-199. https://doi.org/10.1108/jpif-05-2023-0045
Schneider, A., Hommel, G., & Blettner, M. (2010). Linear regression analysis. Deutsches Ärzteblatt International. https://doi.org/10.3238/arztebl.2010.0776
Tokmak, M. (2023). Determination of Maternal Health Status Risk by Machine Learning Methods1. Academic Analysis and Discussions in Engineering, 75.
Ursenbach, J., O’Connell, M., Neiser, J., Tierney, M., Morgan, D., Kosteniuk, J., … & Spiteri, R. (2019). Scoring algorithms for a computer-based cognitive screening tool: an illustrative example of overfitting machine learning approaches and the impact on estimates of classification accuracy.. Psychological Assessment, 31(11), 1377-1382. https://doi.org/10.1037/pas0000764
Uysal, I. (2023). Interpretable Diabetes Prediction using XAI in Healthcare Application. Journal of Multidisciplinary Developments, 8(1), 20-38.
Uysal, I., & Kose, U. (2024). Explainability and the Role of Digital Twins in Personalized Medicine and Healthcare Optimization. In Explainable Artificial Intelligence (XAI) in Healthcare (pp. 141-156). CRC Press.
Xin, S. and Khalid, K. (2018). Modelling house price using ridge regression and lasso regression. International Journal of Engineering & Technology, 7(4.30), 498. https://doi.org/10.14419/ijet.v7i4.30.22378
Vatcheva, K., Lee, M., McCormick, J., & Rahbar, M. (2016). Multicollinearity in regression analyses conducted in epidemiologic studies. Epidemiology Open Access, 06(02). https://doi.org/10.4172/2161-1165.1000227
Yamamoto, F., Ozawa, S., & Wang, L. (2022). Efl-boost: efficient federated learning for gradient boosting decision trees. Ieee Access, 10, 43954-43963. https://doi.org/10.1109/access.2022.3169502
Yalpır, Ş. and Özkan, G. (2018). Knowledge-based fis and anfis models development and comparison for residential real estate valuation. International Journal of Strategic Property Management, 22(2), 110-118. https://doi.org/10.3846/ijspm.2018.442
Zou, H. (2020). Comment: ridge regression—still inspiring after 50 years. Technometrics, 62(4), 456-458. https://doi.org/10.1080/00401706.2020.1801257