BBC NEWS DATA CLASSIFICATION USING NAÏVE BAYES BASED ON BAG OF WORD

Hanan Abbas Salman Department of Computer Systems, Al-Furat Al-Awsat Technical University, Najaf Technical Institute, Iraq
Tameem Hameed Obaida Department of Computer Systems, Al-Furat Al-Awsat Technical University, Najaf Technical Institute, Iraq

Abstract

Sentiment analysis is used in practically every aspect of human life and has a substantial impact on our actions. There is a vast amount of data that presents people' opinions in numerous domains, such as business and politics, thanks to the expansion and use of online technology. Naïve Bayes (NB) is utilized in this paper to examine opinions by text classification and categorizing them into the relevant category (business, entertainment, politics, sprot, and tech). It examines the impact of combining NB classifiers (Multinomial, Gaussian, Bernoulli, and Complement) in conjunction with feature extraction methods in such of frequency-inverse document frequency (TF-IDF) on an accuracy of classifying BBC news data. Some techniques were used to measure the performance of classifiers such as recall, precision, and F1-score. The outcomes show that the Complement scored the highest in accuracy, with a score of 97.604 percent. The Complement was found to be the best fit.

Keywords:

Multinomial, Gaussian, Bernoulli, Complement, Term Frequency, Sentiment Analysis, Opinion Mining


Full Text:

PDF


References



[1] R. Xia, C. Zong, and S. Li, "Ensemble of feature sets and classification algorithms for sentiment classification," Information Sciences, vol. 181, no. 6, pp. 1138-1152, 2011. [2] B. Liu, "Sentiment analysis and opinion mining," Synthesis lectures on human language technologies, vol. 5, no. 1, pp. 1-167, 2012. [3] C. Quan and F. Ren, "Unsupervised product feature extraction for feature-oriented opinion determination," Information Sciences, vol. 272, pp. 16-28, 2014. [4] J. Smailović, M. Grčar, N. Lavrač, and M. Žnidaršič, "Stream-based active learning for sentiment analysis in the financial domain," Information sciences, vol. 285, pp. 181-203, 2014. [5] C. Catal, U. Sevim, and B. Diri, "Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm," Expert Systems with Applications, vol. 38, no. 3, pp. 2347-2353, 2011. [6] D. H. Abd, A. T. Sadiq, and A. R. Abbas, "Classifying political arabic articles using support vector machine with different feature extraction," in International Conference on Applied Computing to Support Industry: Innovation and Technology, 2019, pp. 79-94: Springer. [7] D. H. Abd, A. T. Sadiq, and A. R. Abbas, "Political articles categorization based on different naïve bayes models," in International Conference on Applied Computing to Support Industry: Innovation and Technology, 2019, pp. 286-301: Springer. [8] D. Zhang, "Bayesian Classification," in Fundamentals of Image Data Mining: Springer, 2019, pp. 161-178. [9] S. Raschka, "Naive bayes and text classification i-introduction and theory," arXiv preprint arXiv:1410.5329, 2014. [10] S. Geetha and R. Maniyosai, "An Improved Naive Bayes Classifier on Imbalanced Attributes," International Journal of Organizational and Collective Intelligence (IJOCI), vol. 9, no. 2, pp. 1-15, 2019. [11] S. Xu, "Bayesian Naïve Bayes classifiers to text classification," Journal of Information Science, vol. 44, no. 1, pp. 48-59, 2018. [12] A. McCallum and K. Nigam, "A comparison of event models for naive bayes text classification," in AAAI-98 workshop on learning for text categorization, 1998, vol. 752, no. 1, pp. 41-48: Citeseer. [13] N. Sharma and M. Singh, "Modifying Naive Bayes classifier for multinomial text classification," in 2016 International Conference on Recent Advances and Innovations in Engineering (ICRAIE), 2016, pp. 1-7: IEEE. [14] J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, "Tackling the poor assumptions of naive bayes text classifiers," in Proceedings of the 20th international conference on machine learning (ICML-03), 2003, pp. 616-623. [15] A. Anagaw and Y.-L. Chang, "A new complement naïve Bayesian approach for biomedical data classification," Journal of Ambient Intelligence and Humanized Computing, pp. 1-9, 2018. [16] J. Chen, H. Huang, S. Tian, and Y. Qu, "Feature selection for text classification with Naïve Bayes," Expert Systems with Applications, vol. 36, no. 3, pp. 5432-5435, 2009. [17] B. Tang, S. Kay, and H. He, "Toward optimal feature selection in naive Bayes for text categorization," IEEE transactions on knowledge and data engineering, vol. 28, no. 9, pp. 2508-2521, 2016. [18] H. Shimodaira, "Text classification using naive Bayes," Learning and Data Note, vol. 7, pp. 1-9, 2014. [19] D. Greene and P. Cunningham, "Practical solutions to the problem of diagonal dominance in kernel document clustering," in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 377-384. [20] C. C. Aggarwal and C. Zhai, "A survey of text classification algorithms," in Mining text data: Springer, 2012, pp. 163-222. [21] S. Alowaidi, M. Saleh, and O. Abulnaja, "Semantic sentiment analysis of arabic texts," International Journal of Advanced Computer Science and Applications, vol. 8, no. 2, pp. 256-262, 2017.

Refbacks

  • There are currently no refbacks.