Challenges and Countermeasures of Traditional Statistical Inference in the Era of Big Data
Research Article
Open Access
CC BY

Challenges and Countermeasures of Traditional Statistical Inference in the Era of Big Data

Hanxiang Liu 1*
1 Virginia Tech
*Corresponding author: hanxiangl@vt.edu
Published on 11 November 2025
Journal Cover
AEMPS Vol.239
ISSN (Print): 2754-1177
ISSN (Online): 2754-1169
ISBN (Print): 978-1-80590-525-7
ISBN (Online): 978-1-80590-526-4
Download Cover

Abstract

Big data presents previously unheard-of difficulties for traditional statistical inference techniques created in the 20th century, endangering both their underlying presumptions and their usefulness in real-world scenarios. Three interconnected core challenges are methodically examined in this paper: (1) The out-of-control error discovery rate caused by multiple tests in a high-dimensional environment; (2) Dimensionality disasters and sparsity challenges in high-dimensional data analysis; (3) Computational complexity - Statistical accuracy dilemma. These problems are systemic in nature and call for all-encompassing solutions rather than existing in isolation. The corresponding countermeasures, such as the FDR control strategy, regularization-based high-dimensional modeling techniques, and distributed computing techniques, were reviewed and examined in this paper. As demonstrated in this paper, an innovative method framework that integrates regularization techniques, multiple test corrections, and effective computing strategies offers a workable solution to the significant limitations that traditional statistical methods face in the big data environment. These advancements offer a new path for statistical practice in the digital age by reorienting the paradigm from one that prioritizes accuracy to one that is computationally feasible.

Keywords:

Big Data, Statistical Inference, False Discovery Rate, High-Dimensional Statistics, Computational Statistics.

View PDF
Liu,H. (2025). Challenges and Countermeasures of Traditional Statistical Inference in the Era of Big Data. Advances in Economics, Management and Political Sciences,239,45-51.

References

[1]. Chen, C. P., & Zhang, C. Y. (2014). Data-intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data. Information Sciences, 275, 314-347.

[2]. Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B, 57(1), 289-300.

[3]. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.

[4]. Johnstone, I. M. (2001). On the Distribution of the Largest Eigenvalue in Principal Components Analysis. Annals of Statistics, 29(2), 295-327.

[5]. Fan, J., & Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96(456), 1348-1360.

[6]. Aggarwal, C. C., Hinneburg, A., & Keim, D. A. (2001). On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory-ICDT 2001 (pp. 420-434). Springer.

[7]. Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Johns Hopkins University Press.

[8]. Bottou, L., & Bousquet, O. (2008). The Tradeoffs of Large Scale Learning. In Advances in Neural Information Processing Systems (pp. 161-168).

[9]. Benjamini, Y., & Yekutieli, D. (2001). The Control of the False Discovery Rate in Multiple Testing under Dependency. Annals of Statistics, 29(4), 1165-1188.

[10]. Storey, J. D., Taylor, J. E., & Siegmund, D. (2004). Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach. Journal of the Royal Statistical Society: Series B, 66(1), 187-205.

[11]. Barber, R. F., & Candès, E. J. (2015). Controlling the False Discovery Rate via Knockoffs. Annals of Statistics, 43(5), 2055-2085.

[12]. Hoerl, A. E., & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55-67.

[13]. Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267-288.

[14]. Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Annals of Statistics, 37(4), 1705-1732.

[15]. Robbins, H., & Monro, S. (1951). A Stochastic Approximation Method. Annals of Mathematical Statistics, 22(3), 400-407.

[16]. Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859-877.

[17]. Hoffman, M., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic Variational Inference. Journal of Machine Learning Research, 14(1), 1303-1347.

Cite this article

Liu,H. (2025). Challenges and Countermeasures of Traditional Statistical Inference in the Era of Big Data. Advances in Economics, Management and Political Sciences,239,45-51.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

About volume

Volume title: Proceedings of ICFTBA 2025 Symposium: Data-Driven Decision Making in Business and Economics

ISBN: 978-1-80590-525-7(Print) / 978-1-80590-526-4(Online)
Editor: Lukášak Varti
Conference date: 12 December 2025
Series: Advances in Economics, Management and Political Sciences
Volume number: Vol.239
ISSN: 2754-1169(Print) / 2754-1177(Online)