Semantic Text Indexing

Zbigniew Kaleta

Abstract


This article presents a specific issue of the semantic analysis of texts in natural language – text indexing and describes one field of its application (web browsing).
The main part of this article describes the computer system assigning a set of semantic indexes (similar to keywords) to a particular text. The indexing algorithm employs a semantic dictionary to find specific words in a text, that represent a text content. Furthermore it compares two given sets of semantic indexes to determine texts’ similarity (assigning numerical value). The article describes the semantic dictionary – a tool essential
to accomplish this task and its usefulness, main concepts of the algorithm and test results.

Keywords


Text Subject, Semantic Analysis, Indexing

Full Text:

PDF

References


Princeton University About WordNet. [online] URL: http://wordnet.princeton.edu/, 2010.

WordNet. [online] URL: http://en.wikipedia.org/wiki/WordNet, 2013. [accessed: 2013-04-11 15:11].

Agirre E., Rigau G.: Word sense disambiguation using conceptual density. In: Proceedings of the 16th International Conference on Computational Linguistics. Copenhagen, 1996.

Allan J.: HARD Track Overview. In: TREC 2003, The Twelfth Text REtrieval Conference (TREC 2003) Proceedings. NIST, 2004.

Berners-Lee T., Hendler J., Lassila O.: The Semantic Web. Scientific American, 2001.

Buckley C., Robertson S.: Relevance Feedback Track Overview. In: TREC 2008.The Seventeenth Text REtrieval Conference (TREC 2008) Proceedings. NIST, 2009.

Collins M., Singer Y.: Relational Learning of Pattern-matching Rules, 1999.

Dorosz K., Korzycki M.: Latent semantic analysis evaluation of conceptual dependency driven focused crawling. In: The Indect project: Multimedia Communications, Services & Security Conference. 2012.

Fellbaum C. E.: WordNet: An Electronic Lexical Database, 1988.

Figiel A.: Tekst jako wzorzec informacyjny – automatyczna ocena podobieństwa tematycznego tekstów za pomocą Latent Semantic Analysis. In: W. Lubaszewski (Ed.), Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu, chap. 9, pp. 165–178, Uczelniane Wydawnictwa Naukowo-Dydaktyczne AGH, Kraków, 2009.

Hardtke D., Wertheim M., Cramer M.: Demonstration of Improved Search Result Relevancy Using Real-Time Implicit Relevance Feedback, 2009.

K¨ohler J., Philippi S., Specht M., R¨uegg A.: Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19(8): 744–754, 2006.

Lubaszewski W., Dorosz K., Korzycki M.: D4.4. System for Enhanced Search: A Tool for Pattern Based Information Retrieval. In: The Indect project: European Seventh Framework Programme FP7-218086-Collaborative Project, 2009.

Maziarz M., Piasecki M., Szpakowicz S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference. 2012.

Miller G. A.: WordNet: A Lexical Database for English. Communications of the ACM, 38(11): 39–41, 1995.

Pohl A.: Rozstrzyganie wieloznaczności, maszynowa reprezentacja znaczenia wyrazu i ekstrakcja znaczeń. In: W. Lubaszewski (Ed.) Słowniki komputerowe i automatyczna ekstrakcja informacji z tekstu, 241–256. Kraków, 2009.

Xing W., Ghorbani A.: Weighted PageRank Algorithm. In: Second Annual Conference on Communication Networks and Services Research, 2004. Proceedings, pp. 305–314. 2004.




DOI: https://doi.org/10.7494/csci.2014.15.1.19

Refbacks

  • There are currently no refbacks.