Analysis of the Percentage of the Poor Population in Regencies and Cities on the Island of Java
Keywords:
Semiparametric Regression, B-Spline, Poverty, Cross ValidationAbstract
Poverty remains a significant issue in development on the island of Java, even though this region is the center of national economic activity. The relationship between socioeconomic factors and the percentage of the poor population is not always linear, necessitating an approach that better accommodates the data patterns. This study aims to analyze the percentage of the poor population using a B-Spline semiparametric regression approach. The variables used are the Human Development Index (HDI), Labor Force Participation Rate (LFPR), Open Unemployment Rate (OUR), and percentage of the population. The research data used consists of data from 119 regencies/cities in the Java region in 2024, which was then divided into 80% training data and 20% testing data. The selection of the order and knot points was performed using cross-validation (CV). The best combination was obtained at the second order for the variables and , and a third-order model for the variable which yields a minimum CV value of 8.891438. Model evaluation yields a Mean Absolute Percentage (MAPE) of 20.97% on the training data and 30.04% on the testing data, which falls into the fairly accurate category. Overall, the model is able to comprehensively describe the relationship between the predictor variables and the percentage of the poor population. The HDI has a consistent negative linear effect in reducing poverty, while the LFPR, OUR, and percentage of the population show nonlinear relationships that vary across each value interval. This indicates that the B-Spline semiparametric regression approach is effective in capturing the complex patterns of relationships between socioeconomic factors and poverty levels.
References
[1] E. Muti’ah, M. Listiani, and N. N. Sukma, “Poverty measurement: Income thresholds and multidimensional approaches,” Bina Bangsa International Journal of Business and Management, vol. 4, no. 3, pp. 1–10, 2024, doi: 10.46306/bbijbm.v4i3.104.
[2] D. S. Dedduwakumara, L. A. Prendergast, and R. G. Staudte, “Insights and inference for the proportion below the relative poverty line,” arXiv preprint arXiv:1908.08133, 2019.
[3] Badan Pusat Statistik, Persentase Penduduk Miskin (P0) Menurut Kabupaten/Kota (Persen), 2024. Jakarta, Indonesia: Badan Pusat Statistik, 2024. [Online]. Available: https://www.bps.go.id
[4] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning with Applications in R, 2nd ed. New York, NY, USA: Springer, 2021.
[5] J. M. Wooldridge, Introductory Econometrics: A Modern Approach, 7th ed. Boston, MA, USA: Cengage Learning, 2020.
[6] C. Molnar, Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 2008.
[7] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY, USA: Springer, 2017.
[8] L. L. Schumaker, Spline Functions: Basic Theory, 3rd ed. Cambridge, U.K.: Cambridge University Press, 2007.
[9] P. H. C. Eilers and B. D. Marx, “Practical smoothing: The joy of P-splines,” Statistical Science, vol. 36, no. 3, pp. 417–438, 2021, doi: 10.1214/20-STS792.
[10] A. Perperoglou, W. Sauerbrei, M. Abrahamowicz, and M. Schmid, “A review of spline function procedures in R,” BMC Medical Research Methodology, vol. 19, pp. 1–16, 2019.
[11] Q. Li and J. S. Racine, Nonparametric Econometrics: Theory and Practice. Princeton, NJ, USA: Princeton University Press, 2007.
[12] B. Lu, M. Charlton, P. Harris, and A. S. Fotheringham, “Geographically weighted regression with a non-Euclidean distance metric: A case study using hedonic house price data,” International Journal of Geographical Information Science, vol. 28, no. 4, pp. 660–681, 2014, doi: 10.1080/13658816.2013.865739.
[13] M. Ravallion, The Economics of Poverty: History, Measurement, and Policy. Oxford, U.K.: Oxford University Press, 2016.
[14] Y. Huang, “The poverty reduction effect of economic growth: How does China’s economic growth impact the population size of the urban livings of minimum living allowance?,” in Proceedings of the 2nd International Conference on Management Research and Economic Development, 2024, doi: 10.54254/2754-1169/72/20240695.
[15] D. Ruppert, M. P. Wand, and R. J. Carroll, “Semiparametric regression during 2003–2007,” Electronic Journal of Statistics, vol. 3, pp. 1193–1256, 2009.
[16] S. N. Wood, Generalized Additive Models: An Introduction with R, 2nd ed. Boca Raton, FL, USA: Chapman & Hall/CRC, 2017.
[17] Q. Zou and L. Zhu, “SplineGen: A generative model for B-spline approximation of unorganized points,” Computer-Aided Design, 2024.
[18] D. C. Montgomery, E. A. Peck, and G. G. Vining, Introduction to Linear Regression Analysis, 5th ed. Hoboken, NJ, USA: John Wiley & Sons, 2013.
[19] W. Hardle and O. Linton, Applied Nonparametric Methods. Cambridge, U.K.: Cambridge University Press, 1994.
[20] H. F. F. Mahmoud, “Parametric versus semi and nonparametric regression models,” International Journal of Statistics and Probability, vol. 10, no. 2, 2021.
[21] J. Fan and I. Gijbels, Local Polynomial Modelling and Its Applications. Boca Raton, FL, USA: Chapman & Hall/CRC, 1996.
[22] C. de Boor, A Practical Guide to Splines, Rev. ed. New York, NY, USA: Springer, 2001.
[23] R. L. Eubank, Nonparametric Regression and Spline Smoothing, 2nd ed. New York, NY, USA: Marcel Dekker, 1999.
[24] S. W. Keith and D. B. Allison, “A free-knot spline modeling framework for piecewise linear logistic regression in complex samples with body mass index and mortality as an example,” National Library of Medicine, 2014.
[25] P. C. Chang, Y. W. Wang, and C. H. Liu, “The development of a weighted evolving fuzzy neural network for PCB sales forecasting,” Expert Systems with Applications, vol. 32, pp. 86–96, 2007.
