| Title: | A robust coefficient of determination based on implicit weighting (English) |
| Author: | Kalina, Jan |
| Language: | English |
| Journal: | Applications of Mathematics |
| ISSN: | 0862-7940 (print) |
| ISSN: | 1572-9109 (online) |
| Volume: | 70 |
| Issue: | 5 |
| Year: | 2025 |
| Pages: | 647-670 |
| Summary lang: | English |
| . | |
| Category: | math |
| . | |
| Summary: | In the linear regression model, the standard coefficient of determination $R^2$ and its weighted counterpart are commonly used to assess the quality of the linear fit. However, both metrics are susceptible to the influence of outliers and heteroskedasticity within the dataset. This paper introduces a robust version of $R^2$, based on the least weighted squares (LWS) estimator, and examines its statistical properties in detail. We investigate the impact of data quantization on $R^2$ and its robust variants, and propose a hypothesis test for assessing the equality of expected values between two $R^2$ versions. Numerical experiments on 29 publicly available datasets reveal that confidence intervals for the LWS-based coefficient of determination are generally narrower than those for existing measures, especially in homoskedastic settings. In contrast, under heteroskedasticity, narrower intervals do not necessarily imply greater robustness, highlighting the nuanced behavior of these estimators. The comparison with the well-known least trimmed squares (LTS) estimator underscores the promise of the LWS approach, which exhibits favorable efficiency properties and more reliable interval estimation in many practical scenarios. (English) |
| Keyword: | coefficient of determination |
| Keyword: | outlier |
| Keyword: | robustness |
| Keyword: | least weighted squares |
| Keyword: | quantization |
| Keyword: | nonparametric bootstrap |
| MSC: | 62F03 |
| MSC: | 62J05 |
| DOI: | 10.21136/AM.2025.0105-25 |
| . | |
| Date available: | 2025-11-07T17:09:45Z |
| Last updated: | 2025-11-16 |
| Stable URL: | http://hdl.handle.net/10338.dmlcz/153153 |
| . | |
| Reference: | [1] Alfons, A., Croux, C., Gelper, S.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets.Ann. Appl. Stat. 7 (2013), 226-248. Zbl 1454.62123, MR 3086417, 10.1214/12-AOAS575 |
| Reference: | [2] Armeniakos, G., Zervakis, G., Soudris, D., Henkel, J.: Hardware approximate techniques for deep neural network accelerators: A survey.ACM Comput. Surv. 55 (2022), Article ID 83, 36 pages. 10.1145/352715 |
| Reference: | [3] Bonhomme, S., Weidner, M.: Minimizing sensitivity to model misspecification.Quant. Econ. 13 (2022), 907-954. Zbl 07766822, MR 4480418, 10.3982/QE1930 |
| Reference: | [4] : California housing dataset.Available at https://www.kaggle.com/datasets/camnugent/california-housing-prices (2023). |
| Reference: | [5] Chatterjee, S., Hadi, A. S.: Regression Analysis by Example.Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken (2012). Zbl 1263.62099, 10.1002/0470055464 |
| Reference: | [6] Chicco, D., Warrens, M. J., Jurman, G.: The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.PeerJ Comput. Sci. 7 (2021), Article ID e623, 28 pages. 10.7717/peerj-cs.623 |
| Reference: | [7] Čížek, P.: General trimmed estimation: Robust approach to nonlinear and limited dependent variable models.Econom. Theory 24 (2008), 1500-1529. Zbl 1231.62026, MR 2456536, 10.1017/S0266466608080596 |
| Reference: | [8] Čížek, P.: Semiparametrically weighted robust estimation of regression models.Comput. Stat. Data Anal. 55 (2011), 774-788. Zbl 1247.62115, MR 2736596, 10.1016/j.csda.2010.06.024 |
| Reference: | [9] Crotti, R., (eds.), T. Misrahi: The Travel & Tourism Competitiveness Report 2015: Growth Through Shocks.World Economic Forum, Geneva (2015), Available at\ {\def{ }\let \relax \brokenlink{https://www.weforum.org/publications/travel-and-tourism-competitiveness-}{report-2015/}}\kern0pt. |
| Reference: | [10] Deb, S.: A novel robust R-squared measure and its applications in linear regression.Computational Intelligence in Information Systems Advances in Intelligent Systems and Computing 532. Springer, Cham (2016), 131-142. 10.1007/978-3-319-48517-1_12 |
| Reference: | [11] Dikta, G., Scheer, M.: Bootstrap Methods: With Applications in R.Springer, Cham (2021). Zbl 1467.62004, MR 4306577, 10.1007/978-3-030-73480-0 |
| Reference: | [12] Efron, B., Narasimhan, B.: The automatic construction of bootstrap confidence intervals.J. Comput. Graph. Stat. 29 (2020), 608-619. Zbl 07499300, MR 4153185, 10.1080/10618600.2020.1714633 |
| Reference: | [13] Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models.Cambridge University Press, Cambridge (2006). 10.1017/CBO9780511790942 |
| Reference: | [14] Greene, W. H.: Econometric Analysis.Pearson Education, Harlow (2018). |
| Reference: | [15] Grün, B., Miljkovic, T.: The automated bias-corrected and accelerated bootstrap confidence intervals for risk measures.North Am. Actuar. J. 27 (2023), 731-750. Zbl 1534.91121, MR 4684482, 10.1080/10920277.2022.2141781 |
| Reference: | [16] Jurečková, J., Picek, J., Schindler, M.: Robust Statistical Methods with R.CRC Press, Boca Raton (2019). Zbl 1411.62003, MR 3967085, 10.1201/b21993 |
| Reference: | [17] Kalina, J.: A robust pre-processing of BeadChip microarray images.Biocybernet. Biomedic. Eng. 38 (2018), 556-563. 10.1016/j.bbe.2018.04.005 |
| Reference: | [18] Kalina, J.: Exploring the impact of post-training rounding in regression models.Appl. Math., Praha 69 (2024), 257-271. Zbl 07893334, MR 4728194, 10.21136/AM.2024.0090-23 |
| Reference: | [19] Kalina, J.: Regularized least weighted squares estimator in linear regression.Commun. Stat., Simulation Comput. 54 (2025), 1890-1900. MR 4928083, 10.1080/03610918.2023.2300356 |
| Reference: | [20] Kalina, J., Matonoha, C.: A sparse pair-preserving centroid-based supervised learning method for high-dimensional biomedical data or images.Biocybernet. Biomedic. Eng. 40 (2020), 774-786. 10.1016/j.bbe.2020.03.008 |
| Reference: | [21] Kelly, M., Longjohn, R., Nottingham, K.: UCI Machine Learning Repository.Available at https://archive.ics.uci.edu (2013),\99999sw99999 4074 \goodbreak. |
| Reference: | [22] Kleijnen, J. P. C., Deflandre, D.: Validation of regression metamodels in simulation: Bootstrap approach.Eur. J. Oper. Res. 170 (2006), 120-131. Zbl 1330.62201, MR 2172734, 10.1016/j.ejor.2004.06.018 |
| Reference: | [23] Kmenta, J.: Elements of Econometrics.Macmillan, New York (1986). Zbl 0935.62129, MR 1600099, 10.3998/mpub.15701 |
| Reference: | [24] Lourenço, V. M., Rodrigues, P. C., Pires, A. M., Piepho, H.-P.: A robust DF-REML framework for variance components estimation in genetic studies.Bioinform. 33 (2017), 3584-3594. 10.1093/bioinformatics/btx457 |
| Reference: | [25] Maechler, M., al., et: Robustbase: Basic Robust Statistics R: Package Version 0.92-7.Available at https://cran.r-project.org/package=robustbase (2016). |
| Reference: | [26] Maronna, R. A., Martin, R. D., Yohai, V. J., Salibián-Barrera, M.: Robust Statistics: Theory and Methods (with R).Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken (2019). Zbl 1409.62009, MR 3839299, 10.1002/9781119214656 |
| Reference: | [27] Mittal, M., Satapathy, S. C., Pal, V., Agarwal, B., Goyal, L. M., Parwekar, P.: Prediction of coefficient of consolidation in soil using machine learning techniques.Microprocessors Microsyst. 82 (2021), Article ID 103830, 15 pages. 10.1016/j.micpro.2021.103830 |
| Reference: | [28] Nagelkerke, N. J. D.: A note on a general definition of the coefficient of determination.Biometrika 78 (1991), 691-692. Zbl 0741.62069, MR 1130937, 10.1093/biomet/78.3.691 |
| Reference: | [29] Noma, H., Shinozaki, T., Iba, K., Teramukai, S., Furukawa, T. A.: Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods.Stat. Med. 40 (2021), 5691-5701. Zbl 1546.62560, MR 4330574, 10.1002/sim.9148 |
| Reference: | [30] Ohtani, K.: Bootstrapping $R^2$ and adjusted $R^2$ in regression analysis.Econom. Modelling 17 (2000), 473-483. 10.1016/S0264-9993(99)00034-6 |
| Reference: | [31] Raymaekers, J., Rousseeuw, P. J.: Fast robust correlation for high-dimensional data.Technometrics 63 (2021), 184-198. Zbl 07937929, MR 4251493, 10.1080/00401706.2019.1677270 |
| Reference: | [32] Renaud, O., Victoria-Feser, M.-P.: A robust coefficient of determination for regression.J. Stat. Plann. Inference 140 (2010), 1852-1862. Zbl 1184.62119, MR 2606723, 10.1016/j.jspi.2010.01.008 |
| Reference: | [33] Rousseeuw, P. J., Driessen, K. Van: Computing LTS regression for large data sets.Data Min. Knowl. Discov. 12 (2006), 29-45. MR 2225526, 10.1007/s10618-005-0024-4 |
| Reference: | [34] Salibian-Barrera, M., Zamar, R. H.: Bootstrapping robust estimates of regression.Ann. Stat. 30 (2002), 556-582. Zbl 1012.62028, MR 1902899, 10.1214/aos/1021379865 |
| Reference: | [35] Schneider, C., Vybíral, J.: A multivariate Riesz basis of ReLU neural networks.Appl. Comput. Harmon. Anal. 68 (2024), Article ID 101605, 16 pages. Zbl 1532.68101, MR 4659237, 10.1016/j.acha.2023.101605 |
| Reference: | [36] Shao, J.: Asymptotic distribution of the weighted least squares estimator.Ann. Inst. Stat. Math. 41 (1989), 365-382. Zbl 0692.62012, MR 1006496, 10.1007/BF00049402 |
| Reference: | [37] Shevlyakov, G. L., Vilchevski, N. O.: Robustness in Data Analysis: Criteria and Methods.Modern Probability and Statistics. VSP, Utrecht (2002). |
| Reference: | [38] Späth, H.: Mathematical Algorithms for Linear Regression.Academic Press, Boston (1991). |
| Reference: | [39] Su, P., Tarr, G., Muller, S.: Robust variable selection under cellwise contamination.J. Stat. Comput. Simulation 94 (2024), 1371-1387. Zbl 07895654, MR 4729530, 10.1080/00949655.2023.2286316 |
| Reference: | [40] Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J. S.: Efficient processing of deep neural networks.Synthesis Lectures on Computer Architecture 51. Morgan & Claypool Publishers, San Rafael (2020). Zbl 1437.68006, 10.1007/978-3-031-01766-7 |
| Reference: | [41] Víšek, J. Á.: Consistency of the least weighted squares under heteroscedasticity.Kybernetika 47 (2011), 179-206. Zbl 1220.62064, MR 2828572 |
| Reference: | [42] Wickham, H., Gentleman, R., Ihaka, R., Venables, J. M. Chambers,W. N., Ripley, B. D.: R: The R Project for Statistical Computing.Available at https://www.r-project.org/ (2018). |
| Reference: | [43] Yu, F. W., Ho, W. T., Chan, K. T., Sit, R. K. Y.: Critique of operating variables importance on chiller energy performance using random forest.Energy Buildings 139 (2017), 653-664. 10.1016/j.enbuild.2017.01.063 |
| . |
Fulltext not available (moving wall 24 months)