NAV
中文 DALIAN UNIVERSITY OF TECHNOLOGYLogin
rengongzhinengyingyong
Paper
Current position: Home >> Research Results >> Paper
The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis
Release time:2019-11-01 Hits:
Indexed by: Journal Papers
First Author: Alam, Saqib
Correspondence Author: Alam, S (reprint author), Dalian Univ Technol, Dept Elect Informat & Elect Engn, Black Bldg,Linggong Rd 2, Dalian 116024, Peoples R China.
Co-author: Yao, Nianmin
Date of Publication: 2019-09-01
Journal: COMPUTATIONAL AND MATHEMATICAL ORGANIZATION THEORY
Included Journals: SCIE、SSCI
Document Type: J
Volume: 25
Issue: 3
Page Number: 319-335
ISSN No.: 1381-298X
Key Words: Preprocessing; Machine learning; Sentiment analysis; Word2Vec
Abstract: Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naive Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.
Translation or Not: no