Abstract/Details

Autocorrelation and regularization of query-based information retrieval scores


2008 2008

Other formats: Order a copy

Abstract (summary)

Query-based information retrieval refers to the process of scoring documents given a short natural language query. Query-based information retrieval systems have been developed to support searching diverse collections such as the world wide web, personal email archives, news corpora, and legal collections. This thesis is motivated by one of the tenets of information retrieval: the cluster hypothesis. We define a design principle based on the cluster hypothesis which states that retrieval scores should be locally consistent. We refer to this design principle as score autocorrelation. Our experiments show that the degree to which retrieval scores satisfy this design principle correlates positively with system performance. We use this result to define a general, black box method for improving the local consistency of a set of retrieval scores. We refer to this process as local score regularization. We demonstrate that regularization consistently and significantly improves retrieval performance for a wide variety of baseline algorithms. Regularization is closely related to classic techniques such as pseudo-relevance feedback and cluster-based retrieval. We demonstrate that the effectiveness of these techniques may be explained by their regularizing behavior. We argue that regularization should be adopted either as a generic post-processing step or as a fundamental design principle for retrieval models.

Indexing (details)


Subject
Computer science
Classification
0984: Computer science
Identifier / keyword
Applied sciences; Autocorrelation; Cluster hypothesis; Query-based retrieval; Regularization; Retrieval scores
Title
Autocorrelation and regularization of query-based information retrieval scores
Author
Diaz, Fernando
Number of pages
190
Publication year
2008
Degree date
2008
School code
0118
Source
DAI-B 69/08, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780549663690
Advisor
Allan, James
Committee member
Croft, W. B.; Mahadevan, Sridhar; Staudenmayer, John
University/institution
University of Massachusetts Amherst
Department
Computer Science
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3315491
ProQuest document ID
219900068
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/219900068
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.