Chapter 8: Evaluation in Information Retrieval
& Chapter 9: Relevance feedback and query expansion
In the previous chapter I read in the past month, the authors mentioned many alternatives in designing an IR system, but in this chapter, they discussed how to measure the effectiveness of an IR system and introduce how to develop further measures for evaluating ranked retrieval result. User utility is measured and the user's happiness is very important is a good IR system, for example the speed of response and the size of the index are part of the factors in users' happiness. But it still mentioned that the users' satisfaction is determined by many other factors, for example the result of the design of the user interface and other measurements.
And the standard approach to information retrieval system evaluation revolves around the notion of relevant and nonrelevant documents.But a document is relevant but only because it contain all the words in the query, but also because it may be related to the query, because the information need is not overt.
In addition, a list of the most standard test collections and evaluation series are introduced in this chapter, for example the The Cranfield collection and TREC and NII Test Collection. In evaluation of unranked retrieval sets, the two most frequent and basic measures for information retrieval effectiveness are precision and recall. Maybe the alternative seems to be accuracy, which is the fraction of its classification that are correct. But the accuracy is not an appropriate measure for information retrieval problem.
The advantage of having the two numbers for precision and recall is that one is more important than the other in many circumstances.
In a ranked retrieval context, appropriate sets of retrieved documents are naturally given by the top k retrieved documents. For each set, the precision and recall values can be plotted into a precision -recall curve.
Moreover,to evaluate the system, we have to germane the test information. The most standard approach is called pooling, where relevance is assessed over a subset of the collection that is formed from the top k documents returned by a number of different IR systems. But the relevance judgments are quite idiosyncratic and variable, and the success of an IR system depends on how good it is at satisfying the needs of these idiosyncratic human, one information need at a time.
Finally, the user utility is discussed, which means to make the user happy. For a wen search engine, happy search users are those who find what they want and desire to use this search engine again. But, it is very hard to investigate the satisfaction of the user.
Actually, we have already read contents about the synonymy in the chapter 1, which means the same concept may be referred to using different words.In the chapter9 . the author discussed the ways in which a system can help with query refinement either fully automatically or with user in the loop.
Global methods: the techniques for expanding or reformulating query
1). Query expansion/ reformulation with a thesaurus or WordNet
2). Query expansion via automatic thesaurus generation
3). Techniques like spelling correction
Local methods: adjust a query to the documents that initially appear to match the query.
1). Relevance feedback(most commonly used) -->Interactive relevance feedback can give very substantial gains in retrieval performance.
2). Pseudo relevance feedback, also known as Blind relevance feedback--> provide a method for automatic local analysis and to do normal retrieval to find an initial set of most relevant documents.
3). Global indirect relevance feedback
The core idea of RF is to involve the user in the retrieval process so as to improve the final result set.-->The Rocchio Algorithm
Probabilistic relevance feedback --> Naive Bayes probabilistic model
Relevance feedback on the Web --> general relevance feedback has been little used in the web search. And the successful use of the web link can be viewed as a implicit feedback.
Automatic thesaurus generation--> an alternative to the cost of a manual thesaurus, and analyzing a collection of documents --> to exploit word occurrence-->to use a shallow grammatical analysis of the text and to exploit grammatical relations or dependencies.