Sentiment Analysis of TikTok User Reviews on the Google Play Store
Keywords:
Sentiment Analysis, TikTok Reviews, TF-IDF, Random Forest, ADASYNAbstract
This study aims to analyze the sentiment of user reviews of the TikTok application on the Google Play Store to understand user perceptions of the services provided. The high usage of TikTok generates a large number of user reviews, making manual analysis inefficient; therefore, sentiment analysis is required to automatically identify user opinions. A total of 17,016 reviews were collected using web scraping techniques with the help of the google-play-scraper library in Python. The reviews underwent preprocessing stages, including cleaning, case folding, tokenizing, stopword removal, and stemming, followed by sentiment labeling using the Valence Aware Dictionary and sEntiment Reasoner (VADER). Text representation was transformed into numerical form using the Term Frequency–Inverse Document Frequency (TF-IDF) method, with an 80:20 split for training and testing data. To address data imbalance, the Adaptive Synthetic Sampling (ADASYN) method was applied, while classification was performed using the Random Forest algorithm. Evaluation based on the confusion matrix showed an accuracy of 92.39%, precision of 96.21%, recall of 92.47%, and F1-score of 94.30%. These results indicate that the model effectively classifies user review sentiments and provides insights into user perceptions of the TikTok application.
References
[1] B. Liu, “Sentiment analysis and opinion mining: A survey,” IEEE Intell. Syst., vol. 31, no. 1, pp. 76–80, Jan. 2016, doi: 10.1109/MIS.2016.20.
[2] M. R. Hossain, M. A. Hossain, and M. A. Andersson, “A survey on web scraping techniques and applications,” IEEE Access, vol. 8, pp. 203815–203834, 2020, doi: 10.1109/ACCESS.2020.3036515.
[3] S. Vijayarani, J. Ilamathi, and M. Nithya, “Preprocessing techniques for text mining: An overview,” Procedia Comput. Sci., vol. 89, pp. 45–50, 2017, doi: 10.1016/j.procs.2016.06.009.
[4] C. J. Hutto and E. Gilbert, “VADER: A parsimonious rule-based model for sentiment analysis of social media text,” in Proc. Int. AAAI Conf. Web Soc. Media (ICWSM), 2016, pp. 216–225.
[5] J. Ramos, “Using TF-IDF to determine word relevance in document queries,” Inf. Retrieval J., vol. 20, no. 3, pp. 45–65, 2017.
[6] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 863–875, Mar. 2017, doi: 10.1109/TNNLS.2016.2577039.
[7] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, 2017.
[8] M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf. Process. Manag., vol. 45, no. 4, pp. 427–437, 2018.
[9] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, pp. 1–25, 2018, doi: 10.1002/widm.1253.
[10] K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information, vol. 10, no. 4, pp. 1–68, 2019, doi: 10.3390/info10040150.
[11] S. Minaee et al., “Deep learning-based text classification: A comprehensive review,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 1–23, 2021, doi: 10.1109/TPAMI.2021.3055637.
[12] X. Li, L. Bing, W. Lam, and B. Shi, “Transformation networks for target-oriented sentiment classification,” ACM Comput. Surv., vol. 52, no. 5, pp. 1–38, 2020.
[13] S. Wang, L. L. Minku, and X. Yao, “A systematic study of online class imbalance learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 10, pp. 4802–4815, 2019.
[14] N. V. Chawla, “Data mining for imbalanced datasets: An overview,” Data Min. Knowl. Discov., vol. 6, no. 1, pp. 1–30, 2018.
[15] C. C. Aggarwal, Machine Learning for Text, Cham, Switzerland: Springer, 2018.
[16] D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed. Draft, 2021.
[17] Y. Sun, A. K. Wong, and M. S. Kamel, “Classification of imbalanced data: A review,” Int. J. Pattern Recognit. Artif. Intell., vol. 31, no. 6, pp. 1–35, 2017.
[18] G. Forman, “An extensive empirical study of feature selection metrics for text classification,” J. Mach. Learn. Res., vol. 18, pp. 1–51, 2017.
[19] P. Probst, M. Wright, and A. Boulesteix, “Hyperparameters and tuning strategies for random forest,” Mach. Learn., vol. 108, pp. 1–28, 2019.
[20] J. Brownlee, “Imbalanced classification with Python: Better metrics and models,” Mach. Learn. Mastery, 2020.
[21] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.
[22] E. Cambria, D. Das, S. Bandyopadhyay, and A. Feraco, “Affective computing and sentiment analysis,” IEEE Intell. Syst., vol. 32, no. 2, pp. 102–107, 2017.
[23] R. Feldman, “Techniques and applications for sentiment analysis,” Commun. ACM, vol. 56, no. 4, pp. 82–89, 2018.
[24] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Found. Trends Inf. Retrieval, vol. 2, no. 1–2, pp. 1–135, 2016.
[25] A. Tharwat, “Classification assessment methods,” Appl. Comput. Informat., vol. 17, no. 1, pp. 168–192, 2020, doi: 10.1016/j.aci.2018.08.003.
[26] D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient over F1 score,” BMC Genomics, vol. 21, no. 6, pp. 1–13, 2020.
[27] D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC,” J. Mach. Learn. Technol., vol. 2, no. 1, pp. 37–63, 2020.
