Document controversy classification based on the Wikipedia category structure

Michał Jankowski-Lorek, Kazimierz Zieliński


Dispute and controversy are parts of our culture and cannot be omitted on the Internet (where it becomes more anonymous). There have been many studies on controversy, especially on social networks such as Wikipedia. This free on-line encyclopedia has become a very popular data source among many researchers studying behavior or natural language processing. This paper presents using the category structure of Wikipedia to determine the controversy of a single article. This is the first part of the proposed system for classification of topic controversy score for any given text.


Wikipedia; controversy; classification

Full Text:



Biuk-Aghai R.P., Pang C.I., Si Y.W.: Visualizing Large-scale Human Collaboration in Wikipedia. Future Generation Computet Systems, vol. 31, pp. 120–133, doi:10.1016/j.future.2013.04.001, 2014,

Borzymek P., Sydow M., Wierzbicki A.: Enriching Trust Prediction Model in Social Network with User Rating Similarity. In: Computational Aspects of Social Networks, 2009. CASON ’09. International Conference on, pp. 40–47, 2009, doi: 10.1109/CASoN.2009.30.

Breiman L.: Random Forests. Machine Learning, vol. 45(1), pp. 5–32,

Buriol L., Castillo C., Donato D., Leonardi S., Millozzi S.: Temporal Evolution of the Wikigraph. IEEE CS Press., Hong Kong, 2006.

Hajian B., White T.: Measuring Semantic Similarity using a Multi-Tree Model. In: CEUR Workshop Proceedings, vol. 756, Sun SITE CE, Aachen, Germany.

Han M. S.: Semantic Information Retrieval based on Wikipedia Taxonomy. International Journal of Computer Applications Technology and Research, vol. 2(1), pp. 77–80, 2013.

Jankowski-Lorek M., Nielek R., Wierzbicki A., Zielinski K.: Predicting Controversy of Wikipedia Articles Using the Article Feedback Tool. In: Proceedngs of the 2014 International Conference on Social Computing, SocialCom ’14, pp. 22:1–22:7, ACM, New York, NY, USA, 2014,

Kaptein R., Koolen M., Kamps J.: Using Wikipedia Categories for Ad Hoc Search. In: Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 824–825, ACM, New York, NY, USA, 2009,

Kittur A., Chi E. H., Suh B.: What’s in Wikipedia?: Mapping Topics and Conflict Using Socially Annotated Category Structure. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’09, pp. 1509–1512, ACM, New York, NY, USA, 2009,

Kittur A., Suh B., Pendleton B. A., Chi E. H.: He Says, She Says: Conflict and Coordination in Wikipedia. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’07, pp. 453–462, ACM, New York, NY, USA, 2007,

Medelyan O., Milne D., Legg C., Witten I. H.: Mining Meaning from Wikipedia. International Journal of Human-Computer Studies, vol. 67(9), pp. 716–754, 2009,

Milne D., Witten I. H.: An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. In: Proceedings of AAAI 2008, 2008.

Nastase V., Strube M.: Decoding Wikipedia Categories for Knowledge Acquisition. In: Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 2, AAAI’08, pp. 1219–1224, AAAI Press, 2008,

Rad H. S., Barbosa D.: Identifying Controversial Articles in Wikipedia: A Comparative Study. In: Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration, WikiSym ’12, pp. 7:1–7:10, ACM, New York, NY, USA, 2012,

Schonhofen P.: Identifying Document Topics Using the Wikipedia Category Network. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, WI’06, pp. 456–462, IEEE Computer Society, Washington, DC, USA, 2006,

Sumi R., Yasseri T., Rung A., Kornai A., Kertész J.: Characterization and prediction of Wikipedia edit wars. In: Proceedings of the ACM WebSci 11, 2011.

Sumi R., Yasseri T., Rung A., Kornai A., Kertesz J.: Edit Wars in Wikipedia. In: IEEE third international conference on social computing (socialcom), pp. 724–727, 2011, doi:10.1109/PASSAT/SocialCom.2011.47.

Turek P., Wierzbicki A., Nielek R., Hupa A., Datta A.: Learning About the Quality of Teamwork from Wikiteams. In: Social Computing (Social-Com), 2010 IEEE Second International Conference on, pp. 17–24, 2010, doi: 10.1109/SocialCom.2010.13.

Vandamme S., De Turck F.: Algorithms for Recollection of Search Terms Based on the Wikipedia Category Structure, 2014,

Viegas F. B., Wattenberg M., Dave K.: Studying cooperation and conflict between authors with history flow visualizations. In: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 575–582, 2004,

Voss J.: Collaborative thesaurus tagging the Wikipedia way. CoRR, vol. abs/cs/0604036, 2006,

Vuong B. Q., Lim E. P., Sun A., Le M. T., Lauw H. W., Chang K.: On Ranking Controversies in Wikipedia: Models and Evaluation. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM ’08, pp. 171–182, ACM, New York, NY, USA, 2008,

Wierzbicki A., Turek P., Nielek R.: Learning About Team Collaboration from Wikipedia Edit History. In: Proceedings of the 6th International Symposium on Wikis and Open Collaboration, WikiSym ’10, pp. 27:1–27:2, ACM, New York, NY, USA, 2010,

Yasseri T., Sumi R., Rung A., Kornai A., Kert ́esz J.: Dynamics of conflicts in Wikipedia. PloS one, vol. 7(6), p. e38869, 2012.

Yu J., Thom J.A., Tam A.: Ontology Evaluation Using Wikipedia Categories for Browsing. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, pp. 223–232, ACM, New York, NY, USA, 2007,



  • There are currently no refbacks.