A DEEP LEARNING DRIVEN TEXT CLASSIFICATION APPROACH WITH NAMED ENTITY RECOGNITION

Güncel Sarıman

doi:10.7494/csci.2026.27.1.6738

Authors

Güncel Sarıman Muğla Sıtkı Koçman University

DOI:

https://doi.org/10.7494/csci.2026.27.1.6738

Abstract

In natural language processing with text data, which forms the basis of the studies in the field of Artificial Intelligence, various studies such as semantics and natural language generation are carried out, especially the solution of classification problems. This study aims to analyze the effect of detected named entities on text classification performance to make the text preprocessing stage more effective. In order to reduce the analysis time and increase the performance, after the classical preprocessing stage, word filtering was performed with Named Entity Recognition according to the thresholds determined in the 5% and 10% ranges. Analysis was performed with various machine learning, deep learning algorithms, Bidirectional Encoder Representations from Transformers (BERT) and the obtained results are discussed in the last part of the study. In the problem of classifying 50,000 news texts, 93% with Support Vector Machine (SVM) algorithm in statistical classification with machine learning, 87% with Long shortterm memory (LSTM), and 83% with BERT success was achieved. In the analyses performed with LSTM and BERT, although the model performances were numerically lower, it was observed that the semantic integrity was stronger in text classification and that the success increased after Named Entity Recognition (NER) filtering in general. Thus, it can be interpreted that the dataset that is passed through the NER filter according to the threshold values positively
affects the model's success in terms of time and performance.

Downloads

Download data is not yet available.

References

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.

https://doi.org/10.1145/505282.505283

S. B. Bahçeci. “DOĞAL DIL IŞLEME’NIN ALT DALI: VARLIK ISMI TANIMA.” Medium. Accessed: Apr. 20,

[Online]. Available: https://safaburakbahceci29.medium.com/doğal-dil-işlemenin-alt-dali-varlik-ismi-tanimaeeb9f4551f06

R. Shelke and S. Vanjale, “Recursive LSTM for the Classification of Named Entity Recognition for Hindi

Language”, Ingénierie des systèmes d inf., vol. 27, no. 4, pp. 679–684, Aug. 2022. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.18280/isi.270420

M. Siino, I. Tinnirello, and M. La Cascia, “Is text preprocessing still worth the time? A comparative survey on the

influence of popular preprocessing methods on Transformers and traditional classifiers”, Inf. Syst., vol. 121,

p. 102342, Mar. 2024. Accessed: Apr. 19, 2024. [Online]. Available: https://doi.org/10.1016/j.is.2023.102342

K. Li and C. Kang, “Deep feature extraction with tri-channel textual feature map for text classification”, Pattern

Recognit. Lett., Dec. 2023. Accessed: Apr. 19, 2024. [Online]. Available: https://doi.org/10.1016/j.patrec.2023.12.019

G. Lu, X. Ju, X. Chen, W. Pei, and Z. Cai, “GRACE: Empowering LLM-based software vulnerability detection

with graph structure and in-context learning”, J. Syst. Softw., p. 112031, Mar. 2024. Accessed: Apr. 19, 2024.

[Online]. Available: https://doi.org/10.1016/j.jss.2024.112031

J. Camacho-Collados and M. Taher Pilehvar. “On the Role of Text Preprocessing in Neural Network

Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis.” arXiv.org. Accessed: Apr. 19,

[Online]. Available: https://arxiv.org/abs/1707.01780

N. Patil, A. Patil, and B. V. Pawar, “Named Entity Recognition using Conditional Random Fields”, Procedia

Comput. Sci., vol. 167, pp. 1181–1188, 2020. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.1016/j.procs.2020.03.431

M. Mahmood. “Stop Words and Named Entity Recognition (NER) Filtering for Airline Sentiment Text

PreProcessing.” Medium. Accessed: Apr. 22, 2024. [Online]. Available: https://blog.devgenius.io/stop-words-andnamed-

entity-recognition-ner-filtering-for-airline-sentiment-twitter-dataset-text-52c3643fcac9

S. Situmeang, “Impact of Text Preprocessing on Named Entity Recognition Based on Conditional Random Field

in Indonesian Text”, Mantik, vol. 6, no. 1, pp. 423-430, May 2022.

M. U. SALUR and I. AYDIN, “The Impact of Preprocessing on Classification Performance in Convolutional

Neural Networks for Turkish Text”, in 2018 Int. Conf. Artif. Intell. Data Process. (IDAP), Malatya, Turkey, Sep. 28–

, 2018. IEEE, 2018. Accessed: Apr. 19, 2024. [Online]. Available: https://doi.org/10.1109/idap.2018.8620722

K. G. Schilling et al., “Influence of preprocessing, distortion correction and cardiac triggering on the quality of

diffusion MR images of spinal cord”, Magnetic Reson. Imag., Feb. 2024. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.1016/j.mri.2024.01.008

I. Ali, N. Mughal, Z. H. Khan, J. Ahmed, and G. Mujtaba, “Resume Classification System using Natural

Language Processing and Machine Learning Techniques”, Mehran Univ. Res. J. Eng. Technol., vol. 41, no. 1, pp. 65–

, Jan. 2022. Accessed: Apr. 20, 2024. [Online]. Available: https://doi.org/10.22581/muet1982.2201.07

O. Uslu ve S. Özmen-akyol, “Türkçe Haber Metinlerinin Makine Öğrenmesi Yöntemleri Kullanılarak

Sınıflandırılması”, ESTUDAM Bilişim, c. 2, sy. 1, ss. 15–20, 2021.

R. Szczepanek, “A Deep Learning Model of Spatial Distance and Named Entity Recognition (SD-NER) for Flood

Mark Text Classification”, Water, vol. 15, no. 6, p. 1197, Mar. 2023. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.3390/w15061197

B. He and J. Zhang, “An Association Rule Mining Method Based on Named Entity Recognition and Text

Classification”, Arabian J. Sci. Eng., May 2022. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.1007/s13369-022-06870-x

W. Hemati and A. Mehler, “LSTMVoter: chemical named entity recognition using a conglomerate of sequence

labeling tools”, J. Cheminform., vol. 11, no. 1, Jan. 2019. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.1186/s13321-018-0327-2

M. Ali, G. Tan, and A. Hussain, “Bidirectional Recurrent Neural Network Approach for Arabic Named Entity

Recognition”, Future Internet, vol. 10, no. 12, p. 123, Dec. 2018. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.3390/fi10120123

N. Suat-Rojas, C. Gutierrez-Osorio, and C. Pedraza, “Extraction and Analysis of Social Networks Data to Detect

Traffic Accidents”, Information, vol. 13, no. 1, p. 26, Jan. 2022. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.3390/info13010026

N. Perera, T. T. L. Nguyen, M. Dehmer, and F. Emmert-Streib, “Comparison of Text Mining Models for Food

and Dietary Constituent Named-Entity Recognition”, Mach. Learn. Knowl. Extraction, vol. 4, no. 1, pp. 254–275,

Mar. 2022. Accessed: Apr. 20, 2024. [Online]. Available: https://doi.org/10.3390/make4010012

M. Aydoğan and A. Karci, “Improving the accuracy using pre-trained word embeddings on deep neural networks

for Turkish text classification”, Physica A: Statistical Mechanics its Appl., vol. 541, p. 123288, Mar. 2020. Accessed:

Apr. 20, 2024. [Online]. Available: https://doi.org/10.1016/j.physa.2019.123288

Pankaj, P. Pandey, Muskan, and N. Soni, “Sentiment Analysis on Customer Feedback Data: Amazon Product

Reviews”, in 2019 Int. Conf. Mach. Learn., Big Data, Cloud Parallel Comput. (COMITCon), Faridabad, India,

Feb. 14–16, 2019. IEEE, 2019. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.1109/comitcon.2019.8862258

J. Ahmed and M. Ahmed, “ONLINE NEWS CLASSIFICATION USING MACHINE LEARNING

TECHNIQUES”, IIUM Eng. J., vol. 22, no. 2, pp. 210–225, Jul. 2021. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.31436/iiumej.v22i2.1662

A. Goel, J. Gautam, and S. Kumar, “Real time sentiment analysis of tweets using Naive Bayes”, in 2016 2nd Int.

Conf. Next Gener. Comput. Technol. (NGCT), Dehradun, India, Oct. 14–16, 2016. IEEE, 2016. Accessed: Apr. 20,

[Online]. Available: https://doi.org/10.1109/ngct.2016.7877424

G. Hou, Y. Jian, Q. Zhao, X. Quan, and H. Zhang, “Language model based on deep learning network for

biomedical named entity recognition”, Methods, Apr. 2024. Accessed: Apr. 22, 2024. [Online].

Available: https://doi.org/10.1016/j.ymeth.2024.04.013

F. E. Dalkilic, S. Gelisli, and B. Diri, “Named Entity Recognition from Turkish texts”, in 2010 IEEE 18th Signal

Process. Commun. Appl. Conf. (SIU), Diyarbakir, Turkey, Apr. 22–24, 2010. IEEE, 2010. Accessed: Apr. 20, 2024.

[Online]. Available: https://doi.org/10.1109/siu.2010.5653553

L. Nemes and A. Kiss, “Information Extraction and Named Entity Recognition Supported Social Media

Sentiment Analysis during the COVID-19 Pandemic”, Appl. Sci., vol. 11, no. 22, p. 11017, Nov. 2021. Accessed:

Apr. 20, 2024. [Online]. Available: https://doi.org/10.3390/app112211017

N. Pavitha et al., “Movie Recommendation and Sentiment Analysis Using Machine Learning”, Global

Transitions Proc., Apr. 2022. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.1016/j.gltp.2022.03.012

M. AminiMotlagh, H. Shahhoseini, and N. Fatehi, “A reliable sentiment analysis for classification of tweets in

social networks”, Social Netw. Anal. Mining, vol. 13, no. 1, Dec. 2022. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.1007/s13278-022-00998-2

R. Misra and P. Arora, “Sarcasm detection using news headlines dataset”, AI Open, vol. 4, pp. 13–18, 2023.

Accessed: Apr. 19, 2024. [Online]. Available: https://doi.org/10.1016/j.aiopen.2023.01.001

M. Li, J. Zhu, X. Yang, Y. Yang, Q. Gao, and H. Wang, “CL-WSTC: Continual Learning for Weakly Supervised

Text Classification on the Internet”, in WWW '23: ACM Web Conf. 2023, Austin TX USA. New York, NY, USA:

ACM, 2023. Accessed: Apr. 19, 2024. [Online]. Available: https://doi.org/10.1145/3543507.3583249

R. Misra. “News Category Dataset.” arXiv.org. Accessed: Apr. 19, 2024. [Online].

Available: https://arxiv.org/abs/2209.11429

J. Sun and P. Gloor, ““Towards Re-Inventing Psychohistory”: Predicting the Popularity of Tomorrow’s News

from Yesterday’s Twitter and News Feeds”, J. Syst. Sci. Syst. Eng., Nov. 2020. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.1007/s11518-020-5470-4

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990), Indexing by latent

semantic analysis. J. Am. Soc. Inf. Sci., 41: 391-407. https://doi.org/10.1002/(SICI)1097-

(199009)41:6<391::AID-ASI1>3.0.CO;2-9

N. Leelawat et al., “Twitter data sentiment analysis of tourism in Thailand during the COVID-19 pandemic using

machine learning”, Heliyon, vol. 8, no. 10, Oct. 2022, Art. no. e10894. Accessed: Apr. 20, 2024. [Online].

Available: https://doi.org/10.1016/j.heliyon.2022.e10894

A. Aizawa, “An information-theoretic perspective of tf–idf measures”, Inf. Process. & Manage., vol. 39, no. 1,

pp. 45–65, Jan. 2003. Accessed: Apr. 20, 2024. [Online]. Available: https://doi.org/10.1016/s0306-4573(02)00021-3

T. Mikolov, K. Chen, G. Corrado, and J. Dean. “Efficient Estimation of Word Representations in Vector

Space.” arXiv.org. Accessed: Apr. 20, 2024. [Online]. Available: https://arxiv.org/abs/1301.3781

D. S. Asudani, N. K. Nagwani, and P. Singh, “Impact of word embedding models on text analytics in deep

learning environment: a review”, Artif. Intell. Rev., Feb. 2023. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.1007/s10462-023-10419-1

Q. Wang, P. Liu, Z. Zhu, H. Yin, Q. Zhang, and L. Zhang, “A Text Abstraction Summary Model Based on

BERT Word Embedding and Reinforcement Learning”, Appl. Sci., vol. 9, no. 21, p. 4701, Nov. 2019. Accessed:

Apr. 19, 2024. [Online]. Available: https://doi.org/10.3390/app9214701

C. McCormick. “BERT Word Embeddings Tutorial · Chris McCormick.” Chris McCormick · Machine

Learning Tutorials and Insights. Accessed: Apr. 20, 2024. [Online].

Available: https://mccormickml.com/2019/05/14/BERT-word-embeddings-tutorial

W. Sun, S. Liu, Y. Liu, L. Kong, and Z. Jian, “Named Entity Recognition Networks Based on Syntactically

Constrained Attention”, Appl. Sci., vol. 13, no. 6, p. 3993, Mar. 2023. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.3390/app13063993

P. P. “Text Preprocessing in Natural Language Processing (NLP).” LinkedIn: Log In or Sign Up. Accessed:

Apr. 19, 2024. [Online]. Available: https://www.linkedin.com/pulse/text-preprocessing-natural-languageprocessing-

nlp-prema-p-jurmc/

“spaCy · Industrial-strength Natural Language Processing in Python.” Accessed: Apr. 21, 2024. [Online].

Available: https://spacy.io/

S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory”, Neural Comput., vol. 9, no. 8, pp. 1735–1780,

Nov. 1997. Accessed: Apr. 19, 2024. [Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735

M. Liang and T. Niu, “Research on Text Classification Techniques Based on Improved TF-IDF Algorithm and

LSTM Inputs”, Procedia Comput. Sci., vol. 208, pp. 460–470, 2022. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.1016/j.procs.2022.10.064

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers

for Language Understanding.” arXiv.org. Accessed: Apr. 19, 2024. [Online].

Available: https://arxiv.org/abs/1810.04805

X. Chen, P. Cong, and S. Lv, “A Long-Text Classification Method of Chinese News Based on BERT and

CNN”, IEEE Access, vol. 10, pp. 34046–34057, 2022. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.1109/access.2022.3162614

Pedregosa F, Varoquaux, Ga"el, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine

learning in Python. Journal of machine learning research. 2011;12(Oct):2825–30.

Chollet F, others. Keras [Internet]. GitHub; 2015. Available from: https://github.com/fchollet/keras

Anaconda Software Distribution [Internet]. Anaconda Documentation. Anaconda Inc.; 2020. Available from:

https://docs.anaconda.com/

G. Koppe, A. Meyer-Lindenberg, and D. Durstewitz, “Deep learning for small and big data in

psychiatry”, Neuropsychopharmacology, vol. 46, no. 1, pp. 176–190, Jul. 2020. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.1038/s41386-020-0767-z

F. Hang, L. Xie, Z. Zhang, W. Guo, and H. Li, “Research on the application of network security defence in

database security services based on deep learning integrated with big data analytics”, Int. J. Intell. Netw., Feb. 2024.

Accessed: Apr. 19, 2024. [Online]. Available: https://doi.org/10.1016/j.ijin.2024.02.006

U. Yaseen and S. Langer, “Neural Text Classification and Stacked Heterogeneous Embeddings for Named

Entity Recognition in SMM4H 2021”, in Proc. Sixth Social Media Mining Health (#SMM4H) Workshop Shared

Task, Mexico City, Mexico. Stroudsburg, PA, USA: Assoc. Comput. Linguistics, 2021. Accessed: Apr. 19, 2024.

[Online]. Available: https://doi.org/10.18653/v1/2021.smm4h-1.14

H. B. Patil and A. S. Patil, “Evaluating the Effect of Preprocessing Tools for Marathi Text Retrieval”, Procedia

Comput. Sci., vol. 233, pp. 902–908, 2024. Accessed: Apr. 19, 2024. [Online].

Available: https://doi.org/10.1016/j.procs.2024.03.279

J. Ahmed and M. Ahmed, “Classification, Detection and Sentiment Analysis using Machine Learning over Next

Generation Communication Platforms”, Microprocessors Microsyst., p. 104795, Feb. 2023. Accessed: Apr. 20, 2024.

[Online]. Available: https://doi.org/10.1016/j.micpro.2023.104795.

A DEEP LEARNING DRIVEN TEXT CLASSIFICATION APPROACH WITH NAMED ENTITY RECOGNITION

Authors

DOI:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

Make a Submission