The IMPACT OF N-STAGE LATENT DIRICHLET ALLOCATION ON ANALYSIS OF HEADLINE CLASSIFICATION
DOI:
https://doi.org/10.7494/csci.2022.23.3.4622Abstract
Data analysis becomes difficult with the increase of large amounts of data. More specifically, extracting meaningful insights from this vast amount of data and grouping them based on their shared features without human intervention requires advanced methodologies. There are topic modeling methods to overcome this problem in text analysis for downstream tasks, such as sentiment analysis, spam detection, and news classification. In this research, we benchmark several classifiers, namely Random Forest, AdaBoost, Naive Bayes, and Logistic Regression, using the classical LDA and n-stage LDA topic modeling methods for feature extraction in headlines classification. We run our experiments on 3 and 5 classes publicly available Turkish and English datasets. We demonstrate that n-stage LDA as a feature extractor obtains state-of-the-art performance for any downstream classifier. It should also be noted that Random Forest was the most successful algorithm for both datasets.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Computer Science
This work is licensed under a Creative Commons Attribution 4.0 International License.