Previous |  Up |  Next

Article

Full entry | Fulltext not available (moving wall 24 months)      Feedback
Keywords:
coefficient of determination; outlier; robustness; least weighted squares; quantization; nonparametric bootstrap
Summary:
In the linear regression model, the standard coefficient of determination $R^2$ and its weighted counterpart are commonly used to assess the quality of the linear fit. However, both metrics are susceptible to the influence of outliers and heteroskedasticity within the dataset. This paper introduces a robust version of $R^2$, based on the least weighted squares (LWS) estimator, and examines its statistical properties in detail. We investigate the impact of data quantization on $R^2$ and its robust variants, and propose a hypothesis test for assessing the equality of expected values between two $R^2$ versions. Numerical experiments on 29 publicly available datasets reveal that confidence intervals for the LWS-based coefficient of determination are generally narrower than those for existing measures, especially in homoskedastic settings. In contrast, under heteroskedasticity, narrower intervals do not necessarily imply greater robustness, highlighting the nuanced behavior of these estimators. The comparison with the well-known least trimmed squares (LTS) estimator underscores the promise of the LWS approach, which exhibits favorable efficiency properties and more reliable interval estimation in many practical scenarios.
References:
[1] Alfons, A., Croux, C., Gelper, S.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Stat. 7 (2013), 226-248. DOI 10.1214/12-AOAS575 | MR 3086417 | Zbl 1454.62123
[2] Armeniakos, G., Zervakis, G., Soudris, D., Henkel, J.: Hardware approximate techniques for deep neural network accelerators: A survey. ACM Comput. Surv. 55 (2022), Article ID 83, 36 pages. DOI 10.1145/352715
[3] Bonhomme, S., Weidner, M.: Minimizing sensitivity to model misspecification. Quant. Econ. 13 (2022), 907-954. DOI 10.3982/QE1930 | MR 4480418 | Zbl 07766822
[4] California housing dataset. Available at https://www.kaggle.com/datasets/camnugent/california-housing-prices (2023).
[5] Chatterjee, S., Hadi, A. S.: Regression Analysis by Example. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken (2012). DOI 10.1002/0470055464 | Zbl 1263.62099
[6] Chicco, D., Warrens, M. J., Jurman, G.: The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 7 (2021), Article ID e623, 28 pages. DOI 10.7717/peerj-cs.623
[7] Čížek, P.: General trimmed estimation: Robust approach to nonlinear and limited dependent variable models. Econom. Theory 24 (2008), 1500-1529. DOI 10.1017/S0266466608080596 | MR 2456536 | Zbl 1231.62026
[8] Čížek, P.: Semiparametrically weighted robust estimation of regression models. Comput. Stat. Data Anal. 55 (2011), 774-788. DOI 10.1016/j.csda.2010.06.024 | MR 2736596 | Zbl 1247.62115
[9] Crotti, R., (eds.), T. Misrahi: The Travel & Tourism Competitiveness Report 2015: Growth Through Shocks. World Economic Forum, Geneva (2015), Available at\ {\def{ }\let \relax \brokenlink{ https://www.weforum.org/publications/travel-and-tourism-competitiveness-}{report-2015/}}\kern0pt
[10] Deb, S.: A novel robust R-squared measure and its applications in linear regression. Computational Intelligence in Information Systems Advances in Intelligent Systems and Computing 532. Springer, Cham (2016), 131-142. DOI 10.1007/978-3-319-48517-1_12
[11] Dikta, G., Scheer, M.: Bootstrap Methods: With Applications in R. Springer, Cham (2021). DOI 10.1007/978-3-030-73480-0 | MR 4306577 | Zbl 1467.62004
[12] Efron, B., Narasimhan, B.: The automatic construction of bootstrap confidence intervals. J. Comput. Graph. Stat. 29 (2020), 608-619. DOI 10.1080/10618600.2020.1714633 | MR 4153185 | Zbl 07499300
[13] Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006). DOI 10.1017/CBO9780511790942
[14] Greene, W. H.: Econometric Analysis. Pearson Education, Harlow (2018).
[15] Grün, B., Miljkovic, T.: The automated bias-corrected and accelerated bootstrap confidence intervals for risk measures. North Am. Actuar. J. 27 (2023), 731-750. DOI 10.1080/10920277.2022.2141781 | MR 4684482 | Zbl 1534.91121
[16] Jurečková, J., Picek, J., Schindler, M.: Robust Statistical Methods with R. CRC Press, Boca Raton (2019). DOI 10.1201/b21993 | MR 3967085 | Zbl 1411.62003
[17] Kalina, J.: A robust pre-processing of BeadChip microarray images. Biocybernet. Biomedic. Eng. 38 (2018), 556-563. DOI 10.1016/j.bbe.2018.04.005
[18] Kalina, J.: Exploring the impact of post-training rounding in regression models. Appl. Math., Praha 69 (2024), 257-271. DOI 10.21136/AM.2024.0090-23 | MR 4728194 | Zbl 07893334
[19] Kalina, J.: Regularized least weighted squares estimator in linear regression. Commun. Stat., Simulation Comput. 54 (2025), 1890-1900. DOI 10.1080/03610918.2023.2300356 | MR 4928083
[20] Kalina, J., Matonoha, C.: A sparse pair-preserving centroid-based supervised learning method for high-dimensional biomedical data or images. Biocybernet. Biomedic. Eng. 40 (2020), 774-786. DOI 10.1016/j.bbe.2020.03.008
[21] Kelly, M., Longjohn, R., Nottingham, K.: UCI Machine Learning Repository. Available at https://archive.ics.uci.edu (2013),\99999sw99999 4074 \goodbreak.
[22] Kleijnen, J. P. C., Deflandre, D.: Validation of regression metamodels in simulation: Bootstrap approach. Eur. J. Oper. Res. 170 (2006), 120-131. DOI 10.1016/j.ejor.2004.06.018 | MR 2172734 | Zbl 1330.62201
[23] Kmenta, J.: Elements of Econometrics. Macmillan, New York (1986). DOI 10.3998/mpub.15701 | MR 1600099 | Zbl 0935.62129
[24] Lourenço, V. M., Rodrigues, P. C., Pires, A. M., Piepho, H.-P.: A robust DF-REML framework for variance components estimation in genetic studies. Bioinform. 33 (2017), 3584-3594. DOI 10.1093/bioinformatics/btx457
[25] Maechler, M., al., et: Robustbase: Basic Robust Statistics R: Package Version 0.92-7. Available at https://cran.r-project.org/package=robustbase (2016).
[26] Maronna, R. A., Martin, R. D., Yohai, V. J., Salibián-Barrera, M.: Robust Statistics: Theory and Methods (with R). Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken (2019). DOI 10.1002/9781119214656 | MR 3839299 | Zbl 1409.62009
[27] Mittal, M., Satapathy, S. C., Pal, V., Agarwal, B., Goyal, L. M., Parwekar, P.: Prediction of coefficient of consolidation in soil using machine learning techniques. Microprocessors Microsyst. 82 (2021), Article ID 103830, 15 pages. DOI 10.1016/j.micpro.2021.103830
[28] Nagelkerke, N. J. D.: A note on a general definition of the coefficient of determination. Biometrika 78 (1991), 691-692. DOI 10.1093/biomet/78.3.691 | MR 1130937 | Zbl 0741.62069
[29] Noma, H., Shinozaki, T., Iba, K., Teramukai, S., Furukawa, T. A.: Confidence intervals of prediction accuracy measures for multivariable prediction models based on the bootstrap-based optimism correction methods. Stat. Med. 40 (2021), 5691-5701. DOI 10.1002/sim.9148 | MR 4330574 | Zbl 1546.62560
[30] Ohtani, K.: Bootstrapping $R^2$ and adjusted $R^2$ in regression analysis. Econom. Modelling 17 (2000), 473-483. DOI 10.1016/S0264-9993(99)00034-6
[31] Raymaekers, J., Rousseeuw, P. J.: Fast robust correlation for high-dimensional data. Technometrics 63 (2021), 184-198. DOI 10.1080/00401706.2019.1677270 | MR 4251493 | Zbl 07937929
[32] Renaud, O., Victoria-Feser, M.-P.: A robust coefficient of determination for regression. J. Stat. Plann. Inference 140 (2010), 1852-1862. DOI 10.1016/j.jspi.2010.01.008 | MR 2606723 | Zbl 1184.62119
[33] Rousseeuw, P. J., Driessen, K. Van: Computing LTS regression for large data sets. Data Min. Knowl. Discov. 12 (2006), 29-45. DOI 10.1007/s10618-005-0024-4 | MR 2225526
[34] Salibian-Barrera, M., Zamar, R. H.: Bootstrapping robust estimates of regression. Ann. Stat. 30 (2002), 556-582. DOI 10.1214/aos/1021379865 | MR 1902899 | Zbl 1012.62028
[35] Schneider, C., Vybíral, J.: A multivariate Riesz basis of ReLU neural networks. Appl. Comput. Harmon. Anal. 68 (2024), Article ID 101605, 16 pages. DOI 10.1016/j.acha.2023.101605 | MR 4659237 | Zbl 1532.68101
[36] Shao, J.: Asymptotic distribution of the weighted least squares estimator. Ann. Inst. Stat. Math. 41 (1989), 365-382. DOI 10.1007/BF00049402 | MR 1006496 | Zbl 0692.62012
[37] Shevlyakov, G. L., Vilchevski, N. O.: Robustness in Data Analysis: Criteria and Methods. Modern Probability and Statistics. VSP, Utrecht (2002).
[38] Späth, H.: Mathematical Algorithms for Linear Regression. Academic Press, Boston (1991).
[39] Su, P., Tarr, G., Muller, S.: Robust variable selection under cellwise contamination. J. Stat. Comput. Simulation 94 (2024), 1371-1387. DOI 10.1080/00949655.2023.2286316 | MR 4729530 | Zbl 07895654
[40] Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J. S.: Efficient processing of deep neural networks. Synthesis Lectures on Computer Architecture 51. Morgan & Claypool Publishers, San Rafael (2020). DOI 10.1007/978-3-031-01766-7 | Zbl 1437.68006
[41] Víšek, J. Á.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47 (2011), 179-206. MR 2828572 | Zbl 1220.62064
[42] Wickham, H., Gentleman, R., Ihaka, R., Venables, J. M. Chambers,W. N., Ripley, B. D.: R: The R Project for Statistical Computing. Available at https://www.r-project.org/ (2018).
[43] Yu, F. W., Ho, W. T., Chan, K. T., Sit, R. K. Y.: Critique of operating variables importance on chiller energy performance using random forest. Energy Buildings 139 (2017), 653-664. DOI 10.1016/j.enbuild.2017.01.063
Partner of
EuDML logo