Named-entity recognition for hindi language using context pattern-based maximum entropy

Arti Jain; Divakar Yadav; Anuja Arora; Devendra K. Tayal

doi:10.7494/csci.2022.23.1.3977

Authors

Arti Jain Jaypee Institute of Information Technology, Noida, UP, India
Divakar Yadav NIT Hamirpur Himachal Pradesh, India
Anuja Arora
Devendra K. Tayal

DOI:

https://doi.org/10.7494/csci.2022.23.1.3977

Keywords:

Context Patterns, Gazetteer Lists, Hindi Language, Kaggle Datasets, Maximum Entropy, Named Entity Recognition, Feature Extension

Abstract

This paper describes Named Entity Recognition (NER) system for Hindi language using two methodologies. An existing BaseLine Maximum Entropy-based Named Entity (BL-MENE) model and Context Pattern-based MENE (CP-MENE) framework the one proposed in this work. BL-MENE utilizes several features for the NER task but suffers from inaccurate Named Entity (NE) boundary detection, mis-classification errors, and partial recognition of NEs due to certain missing essentials. However, CP-MENE based NER task incorporates extensive features and patterns set to overcome these problems. In fact, the CP-MENE features include right-boundary, left-boundary, part-of-speech, synonyms, gazetteers and relative pronoun features. CP-MENE formulates a kind of recursive relationship to extract high ranked NE patterns that are generated through regular expressions via python@ code. Nowadays, since the Web contents in the Hindi language are rising, especially in the health-care applications, this work is conducted on the Hindi Health Data (HHD) corpus at Kaggle dataset. We conducted experiments on four NE categories- Person (PER), Disease (DIS), Consumable (CNS) and Symptom (SMP). Usually, researchers’ work upon PER NE within news articles while other NEs, especially related to the health-care domain such as DIS, CNS, and SMP NE types are left out which are incorporated in this research. CP-MENE improvised the classification performance of NEs and the F-measure achieved are 79.68% for PER, 72.50% for DIS, 68.78% for CNS, and 67.23% for SMP respectively which are comparable with respect to other NER approaches.

Downloads

Download data is not yet available.

Author Biographies

Arti Jain, Jaypee Institute of Information Technology, Noida, UP, India

Department of Computer Science & Engineering
Assistant Professor (Sr Grade)
Jaypee Institute of Information Technology, Noida, UP, India
Divakar Yadav, NIT Hamirpur Himachal Pradesh, India

Department of Computer Science & Engineering
Associate Professor
NIT Hamirpur, India

References

Abinaya, N., John, N., Ganesh, B. H., Kumar, A. M., &Soman, K. P. (2014, December). AMRITA_CEN@FIRE-2014: Named entity recognition for Indian languages using rich features. In Proceedings of the Forum for Information Retrieval Evaluation (pp. 103-111). ACM.

Al-Rfou, R., Kulkarni, V., Perozzi, B., &Skiena, S. (2015, June). Polyglot-NER: Massive multilingual named entity recognition. In Proceedings of the 2015 SIAM International Conference on Data Mining (pp. 586-594). Society for Industrial and Applied Mathematics.

Alfred, R., Leong, L. C., On, C. K., & Anthony, P. (2014). Malay named entity recognition based on rule-based approach. International Journal of Machine Learning and Computing, 4(3), 300-306.

Asti, L., Uguzzoni, G., Marcatili, P., &Pagnani, A. (2016). Maximum entropy models of sequenced immune repertories predict antigen-antibody affinity. PLoS Computational Biology, 12(4), e1004870.

Athavale, V., Bharadwaj, S., Pamecha, M., Prabhu, A., & Shrivastava, M. (2016). Towards deep learning in Hindi NER: An approach to tackle the labelled data scarcity. arXiv preprint arXiv:1610.09756.

Banawan, K., &Ulukus, S. (2018). The capacity of private information retrieval from coded databases. IEEE Transactions on Information Theory, 64(3), 1945-1956.

Benajiba, Y., Rosso, P., &Benediruiz, J. (2007, February). Anersys: An Arabic named entity recognition system based on maximum entropy. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics (pp. 143-153). Springer.

Bender, O., Och, F. J., & Ney, H. (2003, May). Maximum entropy models for named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4 (pp. 148-151). Association for Computational Linguistics.

Biswas, S., Mishra, M. K., Sitanath_biswas, S. A., & Mohanty, S. (2010). A two-stage language independent named entity recognition for Indian languages. International Journal of Computer Science and Information Technologies, 1(4), 285-289.

Bontcheva, K., Derczynski, L., & Roberts, I. (2017). Crowdsourcing named entity recognition and entity linking corpora. In Handbook of Linguistic Annotation (pp. 875-892). Springer.

Borthwick, A., Sterling, J., Agichtein, E., &Grishman, R. (1998). Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Sixth Workshop on Very Large Corpora.

Carpuat, M., & Wu, D. (2007, June). Improving statistical machine translation using word sense disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (vol. 7, pp. 61-72).

Carreras, X., Màrquez, L., &Padró, L. (2003, May). Learning a perceptron-based named entity chunker via online recognition feedback. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4 (pp. 156-159). Association for Computational Linguistics.

Charniak, E. (2000, April). A maximum entropy inspired parser. In Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference (pp. 132-139). Association for Computational Linguistics.

Chatterjee, N., & Kaushik, N. (2017). RENT: Regular expression and NLP-based term extraction scheme for agricultural domain. In Proceedings of the International Conference on Data Engineering and Communication Technology (pp. 511-522). Springer.

Chinchor, N., & Marsh, E. (1998, July). Muc-7 information extraction task definition. In Proceeding of the Seventh Message Understanding Conference (MUC-7), Appendices (pp. 359-367).

Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., &Vaithyanathan, S. (2010, October). Domain adaptation of rule-based annotators for named-entity recognition tasks. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 1002-1012). Association for Computational Linguistics.

Chiu, J. P., & Nichols, E. (2015). Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308.

Chopra, D., Jahan, N., &Morwal, S. (2012). Hindi named entity recognition by aggregating rule-based heuristics and hidden markov model. International Journal of Information, 2(6), 43-52.

Cucerzan, S., &Yarowsky, D. (1999). Language independent named entity recognition combining morphological and contextual evidence. In 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.

Ekbal, A., & Bandyopadhyay, S. (2008). Maximum entropy approach for named entity recognition in Indian languages. International Journal for Computer Processing of Languages (IJCPOL), 21(3), 205-237.

Ekbal, A., Haque, R., & Bandyopadhyay, S. (2008). Maximum entropy based Bengali part of speech tagging. A. Gelbukh (Ed.), Advances in Natural Language Processing and Applications, Research in Computing Science Journal, 33, 67-78.

Ekbal, A., & Bandyopadhyay, S. (2009a). A conditional random field approach for named entity recognition in Bengali and Hindi. Linguistic Issues in Language Technology, 2(1), 1-44.

Ekbal, A., & Bandyopadhyay, S. (2009b, August). Voted NER system using appropriate unlabelled data. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (pp. 202-210). Association for Computational Linguistics.

Ekbal, A., & Bandyopadhyay, S. (2010). Named entity recognition using support vector machine: A language independent approach. International Journal of Electrical, Computer, and Systems Engineering, 4(2), 155-170.

Ekbal, A., Saha, S., &Hasanuzzaman, M. (2010, October). Multiobjective approach for feature selection in maximum entropy based named entity recognition. In Proceedings of the 22nd IEEE International Conference on Tools with Artificial Intelligence (ICTA) (vol. 1, pp. 323-326). IEEE.

Ekbal, A., &Saha, S. (2010a). Classifier ensemble selection using genetic algorithm for named entity recognition. Research on Language and Computation, 8(1), 73-99.

Ekbal, A., &Saha, S. (2010b, June). Weighted vote-based classifier ensemble selection using genetic algorithm for named entity recognition. In International Conference on Application of Natural Language to Information Systems (pp. 256-267). Springer.

Ekbal, A., & Bandyopadhyay, S. (2011). Named entity recognition in Bengali and Hindi using support vector machine. LingvisticæInvestigationes, 34(1), 35-67.

Ekbal, A., &Saha, S. (2011a). A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies. Expert Systems with Applications, 38(12), 14760-14772.

Ekbal, A., &Saha, S. (2011b). Weighted vote-based classifier ensemble for named entity recognition: A genetic algorithm-based approach. ACM Transactions on Asian Language Information Processing (TALIP), 10(2), 9.

Ekbal, A., &Saha, S. (2012). Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition. International Journal on Document Analysis and Recognition (IJDAR), 15(2), 143-166.

Ekbal, A., Saha, S., & Singh, D. (2012a, August). Active machine learning technique for named entity recognition. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (pp. 180-186). ACM.

Ekbal, A., Saha, S., & Singh, D. (2012b, November). Ensemble based active annotation for named entity recognition. In Proceedings of the 3rd International Conference on Emerging Applications of Information Technology (EAIT), 2012 (pp. 331-334). IEEE.

Ekbal, A., Saha, S., &Sikdar, U. K. (2016). On active annotation for named entity recognition. International Journal of Machine Learning and Cybernetics, 7(4), 623-640.

El-Halees, A. M. (2015). Arabic text classification using maximum entropy. IUG Journal of Natural Studies, 15(1), 157-167.

Farmakiotou, D., Karkaletsis, V., Koutsias, J., Sigletos, G., Spyropoulos, C. D., &Stamatopoulos, P. (2000, September). Rule-based named entity recognition for Greek financial texts. In Proceedings of the Workshop on Computational Lexicography and Multimedia Dictionaries (COMLEX 2000) (pp. 75-78).

Flood, M., Grant, J., Luo, H., Rashid, L., Soboroff, I., &Yoo, K. (2016, June). Financial entity identification and information integration (feiii) challenge: The report of the organizing committee. In Proceedings of the Second International Workshop on Data Science for Macro-Modeling (pp. 1-4). ACM.

Fu, R., Qin, B., & Liu, T. (2014). Generating Chinese named entity data from parallel corpora. Frontiers of Computer Science, 8(4), 629-641.

Gali, K., Surana, H., Vaidya, A., Shishtla, P., & Sharma, D. M. (2008). Aggregating machine learning and rule-based heuristics for named entity recognition. In Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages.

Gayen, V., & Sarkar, K. (2013). An HMM Based Named Entity Recognition System for Indian Languages. The JU System at ICON. arXiv preprint arXiv:1405.7397.

Gella, S., Sharma, J., & Bali, K. (2013). Query word labelling and back transliteration for Indian languages: Shared Task System Description. FIRE Working Notes 3 (pp. 1-6).

Goodman, J. (2002, July). Sequential conditional generalized iterative scaling. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (pp. 9-16). Association for Computational Linguistics.

Goyal, A. (2008). Named entity recognition for South Asian languages. In Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages.

Guo, H., Zhu, H., Guo, Z., Zhang, X., Wu, X., &Su, Z. (2009, May). Domain adaptation with latent semantic association for named entity recognition. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 281-289). Association for Computational Linguistics.

Gupta, P. K., & Arora, S. (2009). An approach for named entity recognition system for Hindi: an experimental study. Proceedings of ASCNT–2009, CDAC, Noida, India, 103-108.

Gupta, S., & Bhattacharyya, P. (2010, July). Think globally, apply locally: using distributional characteristics for Hindi named entity identification. In Proceedings of the 2010 Named Entities Workshop (pp. 116-125). Association for Computational Linguistics.

Gupta, J. P., Tayal, D. K., & Gupta, A. (2011). A TENGRAM method-based part-of-speech tagging of multi-category words in Hindi language. Expert Systems with Applications, 38(12), 15084-15093.

Han, N. R., Chodorow, M., & Leacock, C. (2004, May). Detecting errors in English article usage with a maximum entropy classifier trained on a large, diverse corpus. In Proceedings of the Language Resources and Evaluation Conference (pp. 1625-1628).

Han, A. L. F., Zeng, X., Wong, D. F., & Chao, L. S. (2015). Chinese named entity recognition with graph-based semi-supervised learning model. In Proceedings of the Eight SIGHAN Workshop on Chinese Language Processing (pp. 15-20).

Han, X., Kwoh, C. K., & Kim, J. J. (2016, July). Clustering based active learning for biomedical Named Entity Recognition. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) (pp. 1253-1260). IEEE.

Hasanuzzaman, M., Ekbal, A., & Bandyopadhyay, S. (2009). Maximum entropy approach for named entity recognition in Bengali and Hindi. International Journal of Recent Trends in Engineering, 1(1), 408-412.

Hasanuzzaman, M., Saha, S., &Ekbal, A. (2010). Feature subset selection using genetic algorithm for named entity recognition. In Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation (pp. 153-162).

Hayes, B., & Wilson, C. (2008). A maximum entropy model of phototactics and phonotactic learning. Linguistic Inquiry, 39(3), 379-440.

Hiremath, P., &Shambhavi, B. R. (2014). Approaches to named entity recognition in Indian languages: a study. International Journal of Engineering and Advanced Technology (IJEAT), 3(6), 191-194.

Ionescu, B., Müller, H., Villegas, M., Arenas, H., Boato, G., Dang-Nguyen, D. T., Cid, Y. D., Eickhoff, C., de Herrera, A. G. S., Gurrin, C., & Islam, B. (2017, September). Overview of ImageCLEF 2017: Information extraction from images. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 315-337). Springer.

Jain, A., Yadav, D., & Tayal, D. K. (2014, September). NER for Hindi language using association rules. In Proceedings of the International Conference on Data Mining and Intelligent Computing (ICDMIC) (pp. 1-5). IEEE.

Jain, A., & Arora, A. (2018a). Named entity system for tweets in Hindi language. International Journal of Intelligent Information Technologies (IJIIT), 14(4), 55-76. IGI Global.

Jain, A., & Arora, A. (2018b). Named entity recognition in Hindi using hyperspace analogue to language and conditional random field. Pertanika Journal of Science & Technology, 26(4), 1801-1822.

Jain, A., Gairola, R., Jain, S., & Arora, A. (2018a). Thwarting spam on Facebook: Identifying spam posts using machine learning techniques. In Social Network Analytics for Contemporary Business Organizations (pp. 51-70). IGI Global.

Jain, A., Gupta, A., Sharma, N., Joshi, S., & Yadav, D. (2018b, April). Mining application on analyzing users’ interests from Twitter. In Proceedings of 3rd International Conference on Internet of Things and Connected Technologies (ICIoTCT) (pp. 26-27), MNIT Jaipur, India.

Jain, A., Tayal, D. K., & Arora, A. (2018c). OntoHindi NER- An ontology based novel approach for Hindi named entity recognition. International Journal of Artificial Intelligence (IJAI), 16(2), 106-135.

Jain, A., Tripathi, S., Dhar, H. D., & Saxena, P. (2018d, August). Forecasting price of cryptocurrencies using tweets sentiment analysis. In Proceedings of the Eleventh International Conference on Contemporary Computing (IC3) (pp. 1-7). IEEE.

Jayan, J. P., Rajeev, R. R., & Sherly, E. (2013). A hybrid statistical approach for named entity recognition for malayalam language. In Proceedings of the 11th Workshop on Asian Language Resources (pp. 58-63).

Kambhatla, N. (2004, July). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions (pp. 22-25). Association for Computational Linguistics.

Kaur, D., & Gupta, V. (2010). A survey of named entity recognition in English and other Indian languages. International Journal of Computer Science Issues (IJCSI), 7(6), 239.

Kaur, Y., & Kaur, E. (2015). Named Entity Recognition system for Hindi Language using combination of rule-based approach and list look up approach. International Journal of Scientific Research and Management (IJSRM), 3(3), 2300-2306.

Kongburan, W., Padungweang, P., Krathu, W., & Chan, J. H. (2016, October). Metabolite named entity recognition: A hybrid approach. In Proceedings of the International Conference on Neural Information Processing (pp. 451-460). Springer.

Konkol, M., Brychcín, T., &Konopík, M. (2015). Latent semantics in named entity recognition. Expert Systems with Applications, 42(7), 3470-3479. Elsevier.

Kozareva, Z., Bonev, B., &Montoyo, A. (2005, November). Self-training and co-training applied to Spanish named entity recognition. In Mexican International Conference on Artificial Intelligence (pp. 770-779). Springer.

Krishnarao, A. A., Gahlot, H., Srinet, A., & Kushwaha, D. S. (2009, May). A comparison of performance of sequential learning algorithms on the task of named entity recognition for Indian languages. In International Conference on Computational Science (pp. 123-132). Springer.

Kumar, N., & Bhattacharyya, P. (2006). Named entity recognition in Hindi using MEMM. Technical Report, IIT Mumbai.

Kumar, P., & Kiran, R. V. (2008). Hybrid named entity recognition system for South and South East Asian languages. In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, (pp. 59–62), Hyderabad, India, January.

Kumar, N. K., Santosh, G. S. K., & Varma, V. (2011, September). A language-independent approach to identify the named entities in under-resourced languages and clustering multilingual documents. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 74-82). Springer.

Li, W., & McCallum, A. (2004). Rapid development of Hindi named entity recognition using conditional random fields and feature induction (short paper). ACM Transactions on Computational Logic.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.

Lin, S. B., & Zhou, D. X. (2018). Distributed kernel-based gradient descent algorithms. Constructive Approximation, 47(2), 249-276.

Leaman, R., & Lu, Z. (2016). TaggerOne: Joint named entity recognition and normalization with semi-Markov models. Bioinformatics, 32(18), 2839-2846.

Meselhi, M. A., Bakr, H. M. A., Ziedan, I., &Shaalan, K. (2014, December). Hybrid named entity recognition-application to Arabic language. In Proceedings of the 9th International Conference on Computer Engineering & Systems (ICCES) (pp. 80-85). IEEE.

Mora, T., Walczak, A. M., Bialek, W., & Callan, C. G. (2010). Maximum entropy models for antibody diversity. In Proceedings of the National Academy of Sciences, 107(12), 5405-5410.

Morwal, S., Jahan, N., & Chopra, D. (2012). Named entity recognition using Hidden Markov Model (HMM). International Journal on Natural Language Computing (IJNLC), 1(4), 15-23.

Moussallem, D., Wauer, M., &Ngomo, A. C. N. (2018). Machine translation using semantic web technologies: A survey. Journal of Web Semantics, 51, 1-19.

Nakov, P., Hoogeveen, D., Màrquez, L., Moschitti, A., Mubarak, H., Baldwin, T., &Verspoor, K. (2017). SemEval-2017 task 3: Community question answering. In Proceedings of the 11th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Vancouver, Canada, SemEval-2017 (pp. 27-48).

Nanda, M. (2014). The Named Entity Recognizer Framework. International Journal of Innovative Research in Advanced Engineering (IJIRAE), 1(4), 104-108.

Nayan, A., Rao, B. R. K., Singh, P., Sanyal, S., &Sanyal, R. (2008, January). Named entity recognition for Indian languages. In Proceedings of the IJCNLP (pp. 97-104).

Neudecker, C. (2016, May). An open corpus for named entity recognition in historic newspapers. In Proceedings of the 10th Language Resources and Evaluation Conference, Republic of Slovenia (pp. 4348-4352).

Nothman, J., Ringland, N., Radford, W., Murphy, T., & Curran, J. R. (2013). Learning multilingual named entity recognition from Wikipedia. Artificial Intelligence, 194, 151-175.

Osborne, M. (2002, July). Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization-Volume 4 (pp. 1-8). Association for Computational Linguistics.

Pakhomov, S. (2002, July). Semi-supervised maximum entropy-based approach to acronym and abbreviation normalization in medical texts. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (pp. 160-167). Association for Computational Linguistics.

Patel, A., Ramakrishnan, G., & Bhattacharya, P. (2009, July). Incorporating linguistic expertise using ILP for named entity recognition in data hungry Indian languages. In Proceedings of the International Conference on Inductive Logic Programming (pp. 178-185). Springer.

Patil, N., Patil, A. S., &Pawar, B. V. (2016). Survey of named entity recognition systems with respect to Indian and foreign languages. International Journal of Computer Applications, 134(16), 21-26.

Plu, J., Rizzo, G., &Troncy, R. (2015, May). A hybrid approach for entity recognition and linking. In Semantic Web Evaluation Challenge (pp. 28-39). Springer.

Putthividhya, D. P., & Hu, J. (2011, July). Bootstrapped named entity recognition for product attribute extraction. In Proceedings of the Conference on Empirical Methods in Natural language Processing (pp. 1557-1567). Association for Computational Linguistics.

Quasthoff, U., Biemann, C., & Wolff, C. (2002). Named entity learning and verification: Expectation Maximisation in large corpora. In Proceedings of CoNNL-2002 (pp. 1-7).

Ratnaparkhi, A., Reynar, J., &Roukos, S. (1994, March). A maximum entropy model for prepositional phrase attachment. In Proceedings of the Workshop on Human Language Technology (pp. 250-255). Association for Computational Linguistics.

Raychaudhuri, S., Chang, J. T., Sutphin, P. D., & Altman, R. B. (2002). Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Research, 12(1), 203-214.

Saha, S. K., Chatterji, S., Dandapat, S., Sarkar, S., & Mitra, P. (2008a, January). A hybrid approach for named entity recognition in Indian languages. In Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian languages (pp. 17-24).

Saha, S. K., Sarkar, S., & Mitra, P. (2008b, January). A hybrid feature set based maximum entropy Hindi named entity recognition. In Proceedings of the IJCNLP (pp. 343-349).

Saha, S. K., Sarkar, S., & Mitra, P. (2008c, January). Gazetteer preparation for named entity recognition in Indian languages. In Proceedings of the International Joint Conference on Natural Language Processing (pp. 9-16).

Saha, S. K., Ghosh, P. S., Sarkar, S., & Mitra, P. (2008d). Named entity recognition in Hindi using maximum entropy and transliteration. Polibits, (38), 33-41.

Saha, S. K., Mitra, P., & Sarkar, S. (2008e, June). Word clustering and word selection-based feature reduction for MaxEnt based Hindi NER. In Proceedings of the ACL (pp. 488-495).

Saha, S. K., Sarkar, S., & Mitra, P. (2009a, July). Hindi named entity annotation error detection and correction. In Language Forum (vol. 35, no. 2, pp. 73-93). Bahri Publications.

Saha, S. K., Mitra, P., & Sarkar, S. (2009b, December). A semi-supervised approach for maximum entropy-based Hindi named entity recognition. In Proceedings of the International Conference on Pattern Recognition and Machine Intelligence (pp. 225-230). Springer.

Saha, S. K., Sarkar, S., & Mitra, P. (2009c). Feature selection techniques for maximum entropy based biomedical named entity recognition. Journal of Biomedical Informatics, 42(5), 905-911.

Saha, S. K., Narayan, S., Sarkar, S., & Mitra, P. (2010). A composite kernel for named entity recognition. Pattern Recognition Letters, 31(12), 1591-1597.

Saha, S. K., Mitra, P., & Sarkar, S. (2012). A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition. Knowledge-Based Systems, 27, 322-332.

Saha, S., &Ekbal, A. (2013). Combining multiple classifiers using vote-based classifier ensemble technique for named entity recognition. Data & Knowledge Engineering, 85, 15-39.

Sahin, H. B., Tirkaz, C., Yidiz, E., Eren, M. T., &Sonmez, O. (2017). Automatically annotated Turkish corpus for named entity recognition and text categorization using large scale gazetteers. arXiv preprint arXiv: 1702.02363.

Sasidhar, B., Yohan, P. M., Babu, A. V., &Goverdhan, A. (2011). A survey on named entity recognition in Indian languages with particular reference to Telugu. International Journal of Computer Science Issues, 8(2), 438-443.

Shaalan, K., &Oudah, M. (2014). A hybrid approach to Arabic named entity recognition. Journal of Information Science, 40(1), 67-87.

Sharma, P., Sharma, U., &Kalita, J. (2011). Named entity recognition: A survey for the Indian languages. Parsing in Indian Languages, 35-39.

Sharnagat, R., & Bhattacharyya, P. (2013). Hindi named entity recognizer for NER task of FIRE 2013. FIRE-2013 (pp.1-5).

Shishtla, P. M., Pingali, P., & Varma, V. (2008). A character n-gram based approach for improved recall in Indian language NER. In Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (pp. 67-73).

Sikdar, U. K., Ekbal, A., &Saha, S. (2012). Differential evolution-based feature selection and classifier ensemble for named entity recognition. In Proceedings of the COLING 2012 (pp. 2475-2490).

Singh, A. K. (2008). Named entity recognition for south and south east Asian languages: taking stock. In Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages (pp.5-15).

Speck, R., &Ngomo, A. C. N. (2014, October). Ensemble learning for named entity recognition. In Proceedings of the International Semantic Web Conference (pp. 519-534). Springer.

Srivastava, S., Sanglikar, M., & Kothari, D. C. (2011). Named entity recognition system for Hindi language: a hybrid approach. International Journal of Computational Linguistics (IJCL), 2(1), 10-23.

Szarvas, G., Farkas, R., &Kocsor, A. (2006, October). A multilingual named entity recognition system using boosting and C4. 5 decision tree learning algorithms. In International Conference on Discovery Science (pp. 267-278). Springer.

Tanabe, L., Xie, N., Thom, L. H., Matten, W., & Wilbur, W. J. (2005). GENETAG: A tagged corpus for gene/protein named entity recognition. BMC Bioinformatics, 6(1), S3.

Uchimoto, K., Sekine, S., &Isahara, H. (2001). The unknown word problem: A morphological analysis of Japanese using maximum entropy aided by a dictionary. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing (pp. 91-99).

Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Azal, N., Liu, S., Zeng, Y., Mehrabi, S., Sohn, S., & Liu, H. (2017). Clinical information extraction applications: A literature review. Journal of biomedical Informatics, 77, 34-49.

Wang, X., Yang, C., & Guan, R. (2018). A comparative study for biomedical named entity recognition. International Journal of Machine Learning and Cybernetics, 9(3), 373-382.

Xiong, D., Liu, Q., & Lin, S. (2006, July). Maximum entropy-based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (pp. 521-528). ACL.

Yadav, V., &Bethard, S. (2018). A survey on recent advances in named entity recognition from deep learning models. In Proceedings of the 27th International Conference on computational Linguistics (pp. 2145-2158).

Named-entity recognition for hindi language using context pattern-based maximum entropy

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

Make a Submission