Broad -coverage hierarchical word sense disambiguation

2005 2005

Other formats: Order a copy

Abstract (summary)

In naturally occurring language, hearers and readers are faced with large numbers of “ambiguous” words, i.e., words with multiple senses, and “unknown” words, i.e., words they are encountering for the first time and could not be in their lexicon. Ambiguous and unknown words seem to cause little difficulty for humans, who infer their syntactic and semantic properties on the fly to resolve ambiguities, and incorporate unknown words into their lexicons. Ambiguous and unknown words also pose problems for dictionary-based approaches in natural language processing applications. To use the information contained in the dictionary it is necessary to associate each word in the text that is being processed with one of the senses or concepts defined in the dictionary. If the word is ambiguous, it is necessary to identify the intended sense among the possible senses of the word, if it is unknown it is necessary to assign the word to one among all possible senses defined by the dictionary. The acquisition of unknown words can be seen as a disambiguation task in which the possible senses are all senses listed in the dictionary. In this thesis we formulate a single unified approach for learning unknown words, and performing word sense disambiguation. We focus on nouns but our method can be generalized to verbs and other syntactic categories. We propose a broad-coverage method which can be applied to any kind of text. We frame this problem as a pattern classification task. Each ambiguous or unknown word is classified as belonging to one of the existing concepts on the basis of morphological, syntactic and semantic properties of the contexts in which it appears. Our system takes as input an existing dictionary, which defines a hierarchy of concepts, and a corpus of textual data, and disambiguates all nouns in the corpus. We demonstrate this by disambiguating all nouns in a 40 million words collection of newspaper articles. We present empirical results from experiments carried out also with novel multi-level classification techniques, which exploit generalizations that hold at different levels of the concept hierarchy.

Indexing (details)

Computer science
0290: Linguistics
0984: Computer science
Identifier / keyword
Applied sciences; Language, literature and linguistics; Disambiguation; Lexical acquisition; Machine learning; Natural language; Ontologies; Word sense
Broad -coverage hierarchical word sense disambiguation
Ciaramita, Massimiliano
Number of pages
Publication year
Degree date
School code
DAI-A 66/05, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
9780542127045, 0542127040
Johnson, Mark
Brown University
University location
United States -- Rhode Island
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.