Previous |  Up |  Next

Article

Title: A robust coefficient of determination based on implicit weighting (English)
Author: Kalina, Jan
Language: English
Journal: Applications of Mathematics
ISSN: 0862-7940 (print)
ISSN: 1572-9109 (online)
Volume: 70
Issue: 5
Year: 2025
Pages: 647-670
Summary lang: English
.
Category: math
.
Summary: In the linear regression model, the standard coefficient of determination $R^2$ and its weighted counterpart are commonly used to assess the quality of the linear fit. However, both metrics are susceptible to the influence of outliers and heteroskedasticity within the dataset. This paper introduces a robust version of $R^2$, based on the least weighted squares (LWS) estimator, and examines its statistical properties in detail. We investigate the impact of data quantization on $R^2$ and its robust variants, and propose a hypothesis test for assessing the equality of expected values between two $R^2$ versions. Numerical experiments on 29 publicly available datasets reveal that confidence intervals for the LWS-based coefficient of determination are generally narrower than those for existing measures, especially in homoskedastic settings. In contrast, under heteroskedasticity, narrower intervals do not necessarily imply greater robustness, highlighting the nuanced behavior of these estimators. The comparison with the well-known least trimmed squares (LTS) estimator underscores the promise of the LWS approach, which exhibits favorable efficiency properties and more reliable interval estimation in many practical scenarios. (English)
Keyword: coefficient of determination
Keyword: outlier
Keyword: robustness
Keyword: least weighted squares
Keyword: quantization
Keyword: nonparametric bootstrap
MSC: 62F03
MSC: 62J05
DOI: 10.21136/AM.2025.0105-25
.
Date available: 2025-11-07T17:09:45Z
Last updated: 2025-11-16
Stable URL: http://hdl.handle.net/10338.dmlcz/153153
.
Reference: [1] Alfons, A., Croux, C., Gelper, S.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets.Ann. Appl. Stat. 7 (2013), 226-248. Zbl 1454.62123, MR 3086417, 10.1214/12-AOAS575
Reference: [2] Armeniakos, G., Zervakis, G., Soudris, D., Henkel, J.: Hardware approximate techniques for deep neural network accelerators: A survey.ACM Comput. Surv. 55 (2022), Article ID 83, 36 pages. 10.1145/352715
Reference: [3] Bonhomme, S., Weidner, M.: Minimizing sensitivity to model misspecification.Quant. Econ. 13 (2022), 907-954. Zbl 07766822, MR 4480418, 10.3982/QE1930
Reference: [4] : California housing dataset.Available at https://www.kaggle.com/datasets/camnugent/california-housing-prices (2023).
Reference: [5] Chatterjee, S., Hadi, A. S.: Regression Analysis by Example.Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken (2012). Zbl 1263.62099, 10.1002/0470055464
Reference: [6] Chicco, D., Warrens, M. J., Jurman, G.: The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.PeerJ Comput. Sci. 7 (2021), Article ID e623, 28 pages. 10.7717/peerj-cs.623
Reference: [7] Čížek, P.: General trimmed estimation: Robust approach to nonlinear and limited dependent variable models.Econom. Theory 24 (2008), 1500-1529. Zbl 1231.62026, MR 2456536, 10.1017/S0266466608080596
Reference: [8] Čížek, P.: Semiparametrically weighted robust estimation of regression models.Comput. Stat. Data Anal. 55 (2011), 774-788. Zbl 1247.62115, MR 2736596, 10.1016/j.csda.2010.06.024
Reference: [9] Crotti, R., (eds.), T. Misrahi: The Travel & Tourism Competitiveness Report 2015: Growth Through Shocks.World Economic Forum, Geneva (2015), Available at\ {\def{ }\let \relax \brokenlink{https://www.weforum.org/publications/travel-and-tourism-competitiveness-}{report-2015/}}\kern0pt.
Reference: [10] Deb, S.: A novel robust R-squared measure and its applications in linear regression.Computational Intelligence in Information Systems Advances in Intelligent Systems and Computing 532. Springer, Cham (2016), 131-142. 10.1007/978-3-319-48517-1_12
Reference: [11] Dikta, G., Scheer, M.: Bootstrap Methods: With Applications in R.Springer, Cham (2021). Zbl 1467.62004, MR 4306577, 10.1007/978-3-030-73480-0
Reference: [12] Efron, B., Narasimhan, B.: The automatic construction of bootstrap confidence intervals.J. Comput. Graph. Stat. 29 (2020), 608-619. Zbl 07499300, MR 4153185, 10.1080/10618600.2020.1714633
Reference: [13] Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models.Cambridge University Press, Cambridge (2006). 10.1017/CBO9780511790942
Reference: [14] Greene, W. H.: Econometric Analysis.Pearson Education, Harlow (2018).
Reference: [15] Grün, B., Miljkovic, T.: The automated bias-corrected and accelerated bootstrap confidence intervals for risk measures.North Am. Actuar. J. 27 (2023), 731-750. Zbl 1534.91121, MR 4684482, 10.1080/10920277.2022.2141781
Reference: [16] Jurečková, J., Picek, J., Schindler, M.: Robust Statistical Methods with R.CRC Press, Boca Raton (2019). Zbl 1411.62003, MR 3967085, 10.1201/b21993
Reference: [17] Kalina, J.: A robust pre-processing of BeadChip microarray images.Biocybernet. Biomedic. Eng. 38 (2018), 556-563. 10.1016/j.bbe.2018.04.005
Reference: [18] Kalina, J.: Exploring the impact of post-training rounding in regression models.Appl. Math., Praha 69 (2024), 257-271. Zbl 07893334, MR 4728194, 10.21136/AM.2024.0090-23
Reference: [19] Kalina, J.: Regularized least weighted squares estimator in linear regression.Commun. Stat., Simulation Comput. 54 (2025), 1890-1900. MR 4928083, 10.1080/03610918.2023.2300356
Reference: [20] Kalina, J., Matonoha, C.: A sparse pair-preserving centroid-based supervised learning method for high-dimensional biomedical data or images.Biocybernet. Biomedic. Eng. 40 (2020), 774-786. 10.1016/j.bbe.2020.03.008
Reference: [21] Kelly, M., Longjohn, R., Nottingham, K.: UCI Machine Learning Repository.Available at https://archive.ics.uci.edu (2013),\99999sw99999 4074 \goodbreak.
Reference: [22] Kleijnen, J. P. C., Deflandre, D.: Validation of regression metamodels in simulation: Bootstrap approach.Eur. J. Oper. Res. 170 (2006), 120-131. Zbl 1330.62201, MR 2172734, 10.1016/j.ejor.2004.06.018
Reference: [23] Kmenta, J.: Elements of Econometrics.Macmillan, New York (1986). Zbl 0935.62129, MR 1600099, 10.3998/mpub.15701
Reference: [24] Lourenço, V. M., Rodrigues, P. C., Pires, A. M., Piepho, H.-P.: A robust DF-REML framework for variance components estimation in genetic studies.Bioinform. 33 (2017), 3584-3594. 10.1093/bioinformatics/btx457
Reference: [25] Maechler, M., al., et: Robustbase: Basic Robust Statistics R: Package Version 0.92-7.Available at https://cran.r-project.org/package=robustbase (2016).
Reference: [26] Maronna, R. A., Martin, R. D., Yohai, V. J., Salibián-Barrera, M.: Robust Statistics: Theory and Methods (with R).Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken (2019). Zbl 1409.62009, MR 3839299, 10.1002/9781119214656
Reference: [27] Mittal, M., Satapathy, S. C., Pal, V., Agarwal, B., Goyal, L. M., Parwekar, P.: Prediction of coefficient of consolidation in soil using machine learning techniques.Microprocessors Microsyst. 82 (2021), Article ID 103830, 15 pages. 10.1016/j.micpro.2021.103830
Reference: [28] Nagelkerke, N. J. D.: A note on a general definition of the coefficient of determination.Biometrika 78 (1991), 691-692. Zbl 0741.62069, MR 1130937, 10.1093/biomet/78.3.691
Reference: [29] Noma, H., Shinozaki, T., Iba, K., Teramukai, S., Furukawa, T. A.: Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods.Stat. Med. 40 (2021), 5691-5701. Zbl 1546.62560, MR 4330574, 10.1002/sim.9148
Reference: [30] Ohtani, K.: Bootstrapping $R^2$ and adjusted $R^2$ in regression analysis.Econom. Modelling 17 (2000), 473-483. 10.1016/S0264-9993(99)00034-6
Reference: [31] Raymaekers, J., Rousseeuw, P. J.: Fast robust correlation for high-dimensional data.Technometrics 63 (2021), 184-198. Zbl 07937929, MR 4251493, 10.1080/00401706.2019.1677270
Reference: [32] Renaud, O., Victoria-Feser, M.-P.: A robust coefficient of determination for regression.J. Stat. Plann. Inference 140 (2010), 1852-1862. Zbl 1184.62119, MR 2606723, 10.1016/j.jspi.2010.01.008
Reference: [33] Rousseeuw, P. J., Driessen, K. Van: Computing LTS regression for large data sets.Data Min. Knowl. Discov. 12 (2006), 29-45. MR 2225526, 10.1007/s10618-005-0024-4
Reference: [34] Salibian-Barrera, M., Zamar, R. H.: Bootstrapping robust estimates of regression.Ann. Stat. 30 (2002), 556-582. Zbl 1012.62028, MR 1902899, 10.1214/aos/1021379865
Reference: [35] Schneider, C., Vybíral, J.: A multivariate Riesz basis of ReLU neural networks.Appl. Comput. Harmon. Anal. 68 (2024), Article ID 101605, 16 pages. Zbl 1532.68101, MR 4659237, 10.1016/j.acha.2023.101605
Reference: [36] Shao, J.: Asymptotic distribution of the weighted least squares estimator.Ann. Inst. Stat. Math. 41 (1989), 365-382. Zbl 0692.62012, MR 1006496, 10.1007/BF00049402
Reference: [37] Shevlyakov, G. L., Vilchevski, N. O.: Robustness in Data Analysis: Criteria and Methods.Modern Probability and Statistics. VSP, Utrecht (2002).
Reference: [38] Späth, H.: Mathematical Algorithms for Linear Regression.Academic Press, Boston (1991).
Reference: [39] Su, P., Tarr, G., Muller, S.: Robust variable selection under cellwise contamination.J. Stat. Comput. Simulation 94 (2024), 1371-1387. Zbl 07895654, MR 4729530, 10.1080/00949655.2023.2286316
Reference: [40] Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J. S.: Efficient processing of deep neural networks.Synthesis Lectures on Computer Architecture 51. Morgan & Claypool Publishers, San Rafael (2020). Zbl 1437.68006, 10.1007/978-3-031-01766-7
Reference: [41] Víšek, J. Á.: Consistency of the least weighted squares under heteroscedasticity.Kybernetika 47 (2011), 179-206. Zbl 1220.62064, MR 2828572
Reference: [42] Wickham, H., Gentleman, R., Ihaka, R., Venables, J. M. Chambers,W. N., Ripley, B. D.: R: The R Project for Statistical Computing.Available at https://www.r-project.org/ (2018).
Reference: [43] Yu, F. W., Ho, W. T., Chan, K. T., Sit, R. K. Y.: Critique of operating variables importance on chiller energy performance using random forest.Energy Buildings 139 (2017), 653-664. 10.1016/j.enbuild.2017.01.063
.

Fulltext not available (moving wall 24 months)

Partner of
EuDML logo