Challenges and Countermeasures of Traditional Statistical Inference in the Era of Big Data

Hanxiang Liu

doi:10.54254/2754-1169/2025.BL29287

Advances in Economics, Management and Political SciencesOpen access

Challenges and Countermeasures of Traditional Statistical Inference in the Era of Big Data

Research Article

Open Access

Challenges and Countermeasures of Traditional Statistical Inference in the Era of Big Data

Hanxiang Liu ^1*

¹ Virginia Tech

^*Corresponding author: hanxiangl@vt.edu

Published on 11 November 2025

AEMPS Vol.239

ISSN (Print): 2754-1177

ISSN (Online): 2754-1169

ISBN (Print): 978-1-80590-525-7

ISBN (Online): 978-1-80590-526-4

Download Cover

Abstract

Big data presents previously unheard-of difficulties for traditional statistical inference techniques created in the 20th century, endangering both their underlying presumptions and their usefulness in real-world scenarios. Three interconnected core challenges are methodically examined in this paper: (1) The out-of-control error discovery rate caused by multiple tests in a high-dimensional environment; (2) Dimensionality disasters and sparsity challenges in high-dimensional data analysis; (3) Computational complexity - Statistical accuracy dilemma. These problems are systemic in nature and call for all-encompassing solutions rather than existing in isolation. The corresponding countermeasures, such as the FDR control strategy, regularization-based high-dimensional modeling techniques, and distributed computing techniques, were reviewed and examined in this paper. As demonstrated in this paper, an innovative method framework that integrates regularization techniques, multiple test corrections, and effective computing strategies offers a workable solution to the significant limitations that traditional statistical methods face in the big data environment. These advancements offer a new path for statistical practice in the digital age by reorienting the paradigm from one that prioritizes accuracy to one that is computationally feasible.

Keywords:

Big Data, Statistical Inference, False Discovery Rate, High-Dimensional Statistics, Computational Statistics.

View PDF

References

[1]. Chen, C. P., & Zhang, C. Y. (2014). Data-intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences, 275, 314-347.

[2]. Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B, 57(1), 289-300.

[3]. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.

[4]. Johnstone, I. M. (2001). On the Distribution of the Largest Eigenvalue in Principal Components Analysis. Annals of Statistics, 29(2), 295-327.

[5]. Fan, J., & Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348-1360.

[6]. Aggarwal, C. C., Hinneburg, A., & Keim, D. A. (2001). On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory-ICDT 2001 (pp. 420-434). Springer.

[7]. Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Johns Hopkins University Press.

[8]. Bottou, L., & Bousquet, O. (2008). The Tradeoffs of Large Scale Learning. In Advances in Neural Information Processing Systems (pp. 161-168).

[9]. Benjamini, Y., & Yekutieli, D. (2001). The Control of the False Discovery Rate in Multiple Testing under Dependency. Annals of Statistics, 29(4), 1165-1188.

[10]. Storey, J. D., Taylor, J. E., & Siegmund, D. (2004). Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach. Journal of the Royal Statistical Society: Series B, 66(1), 187-205.

[11]. Barber, R. F., & Candès, E. J. (2015). Controlling the False Discovery Rate via Knockoffs. Annals of Statistics, 43(5), 2055-2085.

[12]. Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55-67.

[13]. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267-288.

[14]. Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Annals of Statistics, 37(4), 1705-1732.

[15]. Robbins, H., & Monro, S. (1951). A Stochastic Approximation Method. Annals of Mathematical Statistics, 22(3), 400-407.

[16]. Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859-877.

[17]. Hoffman, M., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic Variational Inference. Journal of Machine Learning Research, 14(1), 1303-1347.