Application of linguistic cues in the analysis of language of hate groups
DOI:
https://doi.org/10.7494/csci.2015.16.2.145Keywords:
hate speech, natural language processing, propaganda, machine learningAbstract
Hate speech and fringe ideologies are social phenomena that thrive on-line. Members of the political and religious fringe are able to propagate their ideas via the Internet with less effort than in traditional media. In this article, we attempt to use linguistic cues such as the occurrence of certain parts of speech in order to distinguish the language of fringe groups from strictly informative sources. The aim of this research is to provide a preliminary model for identifying deceptive materials online. Examples of these would include aggressive marketing and hate speech. For the sake of this paper, we aim to focus on the political aspect. Our research has shown that information about sentence length and the occurrence of adjectives and adverbs can provide information for the identification of differences between the language of fringe political groups and mainstream media.Downloads
References
Barkai D.: Peer-to-Peer Computing: technologies for sharing and collaborating on the net. Intel Press, Santa Clara, USA, 2001.
Bird S., Klein E., Loper E.: Natural language processing with Python. O’Reilly Media, Baijing [etc.], 2009.
Chandy R.: Searching Wikiganda: Identifying Propaganda Through Text Analysis. Caltech Undergraduate Research Journal, vol. 9(1), pp. 10–15, 2008.
Feng S., Banerjee R., Choi Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 171–175. Association for Computational Linguistics, 2012.
Fornaciari T., Poesio M.: Lexical vs. surface features in deceptive language analysis. In: Proceedings of the ICAIL 2011 Workshop: Applying Human Language Technology to the Law, pp. 2–8, 2011.
Gîfu D., Cristea D.: Towards an Automated Semiotic Analysis of the Romanian Political Discourse. Computer Science Journal of Moldova, vol. 21(1), pp. 36–64, 2013.
Gîfu D., Dima I. C.: An operational approach of communicational propaganda. International Letters of Social and Humanistic Sciences, vol. 23, pp. 29–38, 2014.
Humpherys S. L., Moffitt K. C., Burns M. B., Burgoon J. K., Felix W. F.: Identification of fraudulent financial statements using linguistic credibility analysis. Decision Support Systems, vol. 50(3), pp. 585–594, 2011.
Lee A. M., Lee E. B.: The fine art of propaganda. Octagon Press, Limited, London, UK, 1972.
Metaxas P.T.: Web spam, social propaganda and the evolution of search engine rankings. In: Web Information Systems and Technologies, pp. 170–182. Springer, 2010.
Mihalcea R., Strapparava C.: The lie detector: Explorations in the automatic recognition of deceptive language. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 309–312. Association for Computational Linguistics, 2009.
Ott M., Choi Y., Cardie C., Hancock J. T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319. Association for Computational Linguistics, 2011.
Paik W., Yilmazel S., Brown E., Poulin M., Dubon S., Amice C.: Applying natural language processing (nlp) based metadata extraction to automatically acquire user preferences. In: Proceedings of the 1st international conference on Knowledge capture, pp. 116–122, ACM, 2001.
Pennebaker J. W., Francis M. E., Booth R. J.: Linguistic Inquiry and Word Count: LIWC 2001. Erlaub Publishers, Mahwah, NJ, 2001.
Philippe L. L., Nancy J. L.: Le mythe nazi. Editions de l’Aube, La Tour d’Aigues, France.
Santini M., Power R., Evan R.: Implementing a characterization of genre for automatic genre identification of web pages. In: Proceedings of the COLING/ACL on Main conference poster sessions, pp. 699–706. Association for Computational Linguistics, 2006.
Sharoff S.: Classifying Web corpora into domain and genre using automatic feature identification. In: Proceedings of the 3rd Web as Corpus Workshop, pp. 83–94. 2007.
Turek P., Wierzbicki A., Nielek R., Datta A.: WikiTeams: How Do They Achieve Success? In: Potentials IEEE, vol. 30(5), pp. 15–20, 2011.
Turek P., Wierzbicki A., Nielek R., Hupa A., Datta A.: Learning about the quality of teamwork from wikiteams. In: Social Computing (SocialCom), 2010 IEEE Second International Conference on, pp. 17–24, IEEE, 2010.
Wierzbicki A., Szczepaniak R., Buszka M.: Application layer multicast for efficient peer-to-peer applications. In: Internet Applications. WIAPP 2003. Proceedings. The Third IEEE Workshop on, pp. 126–130, IEEE, 2003.