The IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION

Authors

  • Zekeriya Anil Guven Ege University
  • Banu Diri Yildiz Technical University
  • Tolgahan Cakaloglu Walmart Global Tech

DOI:

https://doi.org/10.7494/csci.2022.23.3.4622

Abstract

Data analysis becomes difficult with the increase of large amounts of data. More specifically, extracting meaningful insights from this vast amount of data and grouping them based on their shared features without human intervention requires advanced methodologies. There are topic modeling methods to overcome this problem in text analysis for downstream tasks, such as sentiment analysis, spam detection, and news classification. In this research, we benchmark several classifiers, namely Random Forest, AdaBoost, Naive Bayes, and Logistic Regression, using the classical LDA and n-stage LDA topic modeling methods for feature extraction in headlines classification. We run our experiments on 3 and 5 classes publicly available Turkish and English datasets. We demonstrate that n-stage LDA as a feature extractor obtains state-of-the-art performance for any downstream classifier. It should also be noted that Random Forest was the most successful algorithm for both datasets.

Downloads

Download data is not yet available.

Downloads

Published

2022-10-02

How to Cite

Guven, Z. A., Diri, B., & Cakaloglu, T. (2022). The IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION. Computer Science, 23(3). https://doi.org/10.7494/csci.2022.23.3.4622

Issue

Section

Articles