Research in information retrieval has traditionally concentrated on making assumptions about the content of documents based on very shallow semantic analysis through word occurrence statistics of various kinds. But texts are more than bags of words, and the semantic analysis information retrieval systems typically used is overly simple. There is ample reason to try to broaden the view of what text is and why. Better content analysis alone will not be enough. Texts are more than their meaning. Texts have structure, they have context, they are written in a style conformant or discordant to a genre they are to be understood in, they may be carefully written or hastily thrown together, they are written by various types of agent for various reasons. Besides information to be found in the text or from the author, texts are used by readers of various backgrounds, for various reasons, and with varying degree of satisfaction. This paper outlines a framework within which to find more knowledge from texts than an approximation of their topic, and gives examples of how to use this knowledge to design useful tools for information access.
Invited talk