
Non-Topical Factors in Information Access
PROCEEDINGS
Jussi Karlgren, SICS, Sweden & Helsinki University, Finland
WebNet World Conference on the WWW and Internet, in Honolulu, Hawaii Publisher: Association for the Advancement of Computing in Education (AACE), Chesapeake, VA
Abstract
Research in information retrieval has traditionally concentrated on making assumptions about the content of documents based on very shallow semantic analysis through word occurrence statistics of various kinds. But texts are more than bags of words, and the semantic analysis information retrieval systems typically used is overly simple. There is ample reason to try to broaden the view of what text is and why. Better content analysis alone will not be enough. Texts are more than their meaning. Texts have structure, they have context, they are written in a style conformant or discordant to a genre they are to be understood in, they may be carefully written or hastily thrown together, they are written by various types of agent for various reasons. Besides information to be found in the text or from the author, texts are used by readers of various backgrounds, for various reasons, and with varying degree of satisfaction. This paper outlines a framework within which to find more knowledge from texts than an approximation of their topic, and gives examples of how to use this knowledge to design useful tools for information access.
Citation
Karlgren, J. (1999). Non-Topical Factors in Information Access. In Proceedings of WebNet World Conference on the WWW and Internet 1999 (pp. 27-31). Honolulu, Hawaii: Association for the Advancement of Computing in Education (AACE). Retrieved February 5, 2023 from https://www.learntechlib.org/primary/p/7117/.
© 1999 Association for the Advancement of Computing in Education (AACE)
References
View References & Citations Map- Douglas Biber. (1989). “A typology of English texts”, Linguistics, 27:3-43. Douglas Biber. (1988.) Variation across speech and writing. Cambridge University Press.
- Johan Dewe, Jussi Karlgren, and Ivan Bretan. (1998). “Assembling a Balanced Corpus from the Internet”. Proceedings of the 11th Nordic Conference on Computational Linguistics, Copenhagen. Marti Hearst. (1997.) “TextTiling: Segmenting Text into Multi-Paragraph Subtopic Passages”. Computational Linguistics.
- Jussi Karlgren. (1998). Stylistic Experiments for Information Retrieval. Strzalkowski, T. (ed.) Natural Language Information Retrieval, Tomek, Kluwer.
These references have been extracted automatically and may have some errors. Signed in users can suggest corrections to these mistakes.
Suggest Corrections to References