Analysis of data pre-processing methods for the sentiment analysis of reviews

Tuba Parlar, Ayşe Selma Ozel, Fei Song

Abstract


The aim of this study is to analyse the effects of data pre-processing methods for sentiment analysis and determine which of these pre-processing methods and their combinations are effective for English and an agglutinative language like Turkish. We also try to answer the research question “is there any difference between agglutinative and non-agglutinative languages in terms of pre-processing methods for sentiment analysis?” We find that the performance results for the English reviews are generally higher than for the Turkish reviews related to the differences between the two languages in terms of vocabularies, writing styles, and agglutinative property of the Turkish language.

Keywords


Data pre-processing, feature selection, sentiment analysis, text classification

Full Text:

PDF

References


Abbasi A., Chen H., Salem A.: Sentiment analysis in multiple languages: Featureselection for opinion classification in Web forums. In:ACM Transactions onInformation Systems . . ., vol. 26(3), pp. 1–34, 2008. ISSN 10468188. URLhttp://dx.doi.org/10.1145/1361684.1361685.[2] Agarwal B., Mittal N.: Prominent feature extraction for review analysis: anempirical study. In:Journal of Experimental & Theoretical Artificial Intelligence,vol. 28(3), pp. 485–498, 2016. ISSN 0952-813X. URLhttp://dx.doi.org/10.1080/0952813X.2014.977830.[3] Akba F., U ̧can A., Sezer E., Sever H.: Assessment of feature selection metricsfor sentiment analyses: Turkish movie reviews. In:8th European Conference onData Mining, pp. 180–184. Lisbon, Portugal, 2014. ISBN 9789898704108. URLhttp://humir.cs.hacettepe.edu.tr/file/AkbaFUcanA.pdf.[4] Akn A.A., Akn M.D.: Zemberek, An Open Source Nlp Framework for TurkicLanguages. In:Structure, vol. 10, pp. 1–5, 2007. URLhttp://zemberek.googlecode.com/files/zemberek_makale.pdf.[5] Asgarian E., Kahani M., Sharifi S.: The Impact of Sentiment Features on theSentiment Polarity Classification in Persian Reviews. In:Cognitive Computation,2018/10/20; 14:58 str. 14/16

vol. 10(1), pp. 117–135, 2018. ISSN 1866-9956. URLhttp://dx.doi.org/10.1007/s12559-017-9513-1.[6] Bird S., Klein E., Loper E.:Natural Language Processing with Python. O’Reilly,2009. URLhttp://www.nltk.org/book_1ed/.[7] Blitzer J., Dredze M., Pereira F.: Biographies, bollywood, boom-boxes andblenders: Domain adaptation for sentiment classification. In:45th AnnualMeeting-Association for Computational Linguistics, pp. 440–447. 2007. ISBN9781424491131. ISSN 0736587X. URLhttp://dx.doi.org/10.1109/IRPS.2011.5784441.[8] C ̧ akici R.:Wide-coverage parsing for Turkish. Ph.D. thesis, PhD Thesis, Univer-sity of Edinburgh, 2009. URLhttp://hdl.handle.net/1842/3807.[9] Cetin M., Amasyali M.F.: Supervised and traditional term weighting methods forsentiment analysis. In:21st Signal Processing and Communications ApplicationsConference (SIU), pp. 1–4, 2013. ISSN 0162-8828. URLhttp://dx.doi.org/10.1109/SIU.2013.6531173.[10] Demirtas E., Pechenizkiy M.: Cross-lingual polarity detection with machinetranslation. In:Second International Workshop on Issues of Sentiment Discov-ery and Opinion Mining - WISDOM ’13, pp. 1–8. ACM Press, New York, NewYork, USA, 2013. ISBN 9781450323321. URLhttp://dx.doi.org/10.1145/2502069.2502078.[11] Despotovic V., Tanikic D.: Sentiment Analysis of Microblogs Using Multi-layer Feed-Forward Artificial Neural Networks. In:COMPUTING AND IN-FORMATICS, vol. 36(5), pp. 1127–1142, 2017. ISSN 2585-8807. URLhttp://www.cai.sk/ojs/index.php/cai/article/viewArticle/2017_5_1127.[12] Devitt A., Ahmad K.: Sentiment polarity identification in financial news: acohesion-based approach. In:Proceedings of Annual Meeting of the Associationof Computational Linguistics, (June), pp. 984–991, 2007. ISSN 0736587X. URLhttp://dx.doi.org/10.1.1.143.7157.[13] Duwairi R., El-Orfali M.: A study of the effects of preprocessing strategieson sentiment analysis for Arabic text. In:Journal of Information Science,vol. 40(4), pp. 501–513, 2014. ISSN 0165-5515. URLhttp://dx.doi.org/10.1177/0165551514534143.[14] Ero ̆gul U.:Sentiment Analysis in Turkish. Ph.D. thesis, 2009. URLhttp://dx.doi.org/10.1007/s13398-014-0173-7.2.[15] Kaya M., Fidan G., Toroslu I.H.: Sentiment Analysis of Turkish Political News.In:2012 IEEE/WIC/ACM International Conferences on Web Intelligence andIntelligent Agent Technology, pp. 174–180. IEEE, Macau, China, 2012. ISBN978-1-4673-6057-9. URLhttp://dx.doi.org/10.1109/WI-IAT.2012.115.[16] Liu Y., Bi J.W., Fan Z.P.: Multi-class sentiment classification: The experimentalcomparisons of feature selection and machine learning algorithms. In:ExpertSystems with Applications, vol. 80, pp. 323–339, 2017. ISSN 09574174. URLhttp://dx.doi.org/10.1016/j.eswa.2017.03.042.2018/10/20; 14:58 str. 15/16

Mladenovi ́c M., Mitrovi ́c J., Krstev C., Vitas D.: Hybrid sentiment analy-sis framework for a morphologically rich language. In:Journal of IntelligentInformation Systems, vol. 46(3), pp. 599–620, 2016. ISSN 0925-9902. URLhttp://dx.doi.org/10.1007/s10844-015-0372-5.[18] Nicholls C., Song F.: Comparison of Feature Selection Methods for SentimentAnalysis. In:Advances in Artificial Intelligence, pp. 286–289. Springer, Berlin,Heidelberg, 2010. ISBN 978-3-642-13059-5. URLhttp://dx.doi.org/10.1007/978-3-642-13059-5{_}30.[19] Pang B., Lee L.: A sentimental education. In:Proceedings of the 42nd AnnualMeeting on Association for Computational Linguistics - ACL ’04, pp. 271–es.Association for Computational Linguistics, Morristown, NJ, USA, 2004. ISSN1554-0669. URLhttp://dx.doi.org/10.3115/1218955.1218990.[20] Pang B., Lee L.: Opinion Mining and Sentiment Analysis. In:Foundations andTrendsR©in Information Retrieval, vol. 2(12), pp. 1–135, 2008. ISSN 1554-0669.URLhttp://dx.doi.org/10.1561/1500000011.[21] Pang B., Lee L., Vaithyanathan S.: Thumbs up? In:Proceedings of the ACL-02conference on Empirical methods in natural language processing - EMNLP ’02,vol. 10, pp. 79–86. Association for Computational Linguistics, Morristown, NJ,USA, 2002. URLhttp://dx.doi.org/10.3115/1118693.1118704.[22] Parlar T., ̈Ozel S., Song F.: QER: a new feature selection method for sen-timent analysis.In:Human-centric Computing and Information Sciences,vol. 8(1), p. 10, 2018. ISSN 21921962. URLhttp://dx.doi.org/10.1186/s13673-018-0135-8.[23] Sevindi B.I.:T ̈urk ̧ce Metinlerde Denetimli ve S ̈ozl ̈uk Tabanl Duygu AnaliziYakla ̧smlarnn Kar ̧sla ̧strlmas. Ph.D. thesis, MSc Thesis, Gazi University, 2013.[24] Witten I.H., Frank E., Hall M.A.:Data mining: Practical Machine LearningTools and Techniques. Morgan Kaufmann, 2011.[25] Yang D.H., Yu G.: A method of feature selection and sentiment similarity forChinese micro-blogs. In:Journal of Information Science, vol. 39(4), pp. 429–441,2013. ISSN 0165-5515. URLhttp://dx.doi.org/10.1177/0165551513480308.[26] Zheng L., Wang H., Gao S.: Sentimental feature selection for sentiment analysisof Chinese online reviews. In:International Journal of Machine Learning andCybernetics, vol. 9(1), pp. 75–84, 2018. ISSN 1868-8071. URLhttp://dx.doi.org/10.1007/s13042-015-0347-4.




DOI: https://doi.org/10.7494/csci.2019.20.1.3097

Refbacks

  • There are currently no refbacks.