Abstract/Details

Sentence level information patterns for novelty detection


2006 2006

Other formats: Order a copy

Abstract (summary)

The detection of new information in a document stream is an important component of many potential applications. In this thesis, a new novelty detection approach based on the identification of sentence level information patterns is proposed. Given a user's information need, some information patterns in sentences such as combinations of query words, sentence lengths, named entities and phrases, and other sentence patterns, may contain more important and relevant information than single words. The work of the thesis includes three parts. First, we redefine "what is novelty detection" in the lights of the proposed information patterns. Examples of several different types of information patterns are given corresponding to different types of uses' information need. Second, we analyze why the proposed information pattern concept has a significant impact in novelty detection. A thorough analysis of sentence level information patterns is elaborated on data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, we present how we perform novelty detection based on information patterns, which focuses on the identification of previously unseen query-related patterns in sentences. A unified pattern-based approach is presented to novelty detection for both specific NE topics and more general topics. Experiments on novelty detection were carried out on data from the TREC 2002, 2003 and 2004 novelty tracks. Experimental results show that the proposed approach significantly improves the performance of novelty detection for both specific and general topics, therefore the overall performance for all topics, in terms of precision at top ranks. Future research directions are suggested.

Indexing (details)


Subject
Computer science;
Information systems
Classification
0984: Computer science
0723: Information systems
Identifier / keyword
Communication and the arts, Applied sciences, Information patterns, Novelty detection, Question answering, Sentence
Title
Sentence level information patterns for novelty detection
Author
Li, Xiaoyan
Number of pages
153
Publication year
2006
Degree date
2006
School code
0118
Source
DAI-B 67/11, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780542977541
Advisor
Croft, W. Bruce
University/institution
University of Massachusetts Amherst
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3242322
ProQuest document ID
305311679
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/305311679
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.