Abstract/Details

Beyond nouns and verbs


2009 2009

Other formats: Order a copy

Abstract (summary)

During the past decade, computer vision research has focused on constructing image based appearance models of objects and action classes using large databases of examples (positive and negative) and machine learning to construct models. Visual inference however involves not only detecting and recognizing objects and actions but also extracting rich relationships between objects and actions to form storylines or plots. These relationships also improve recognition performance of appearance-based models. Instead of identifying individual objects and actions in isolation, such systems improve recognition rates by augmenting appearance based models with contextual models based on object-object, action-action and object-action relationships. In this thesis, we look at the problem of using contextual information for recognition from three different perspectives: (a) Representation of Contextual Models; (b) Role of language in learning semantic/contextual models; (c) Learning of contextual models from weakly labeled data.

Our work departs from the traditional view of visual and contextual learning where individual detectors and relationships are learned separately. Our work focuses on simultaneous learning of visual appearance and contextual models from richly annotated, weakly labeled datasets. Specifically, we show how rich annotations can be utilized to constrain the learning of visually grounded models of nouns, prepositions and comparative adjectives from weakly labeled data. I will also show how visually grounded models of prepositions and comparative adjectives can be utilized as contextual models for scene analysis. We also present storyline models for interpretation of videos. Storyline models go beyond pair-wise contextual models and represent higher order constraints by allowing only specific possible action sequences (stories). Visual inference using storyline models involve inferring the “plot” of the video (sequence of actions) and recognizing individual activities in the plot.

Indexing (details)


Subject
Artificial intelligence;
Computer science
Classification
0800: Artificial intelligence
0984: Computer science
Identifier / keyword
Applied sciences; Computer vision; Image annotation; Learning; Probabilisitc graphical models; Storyline models
Title
Beyond nouns and verbs
Author
Gupta, Abhinav
Number of pages
141
Publication year
2009
Degree date
2009
School code
0117
Source
DAI-B 70/09, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9781109380989
Advisor
Davis, Larry
Committee member
Duraiswami, Ramani; Jacobs, David; Kedem, Benjamin; Shi, Jianbo
University/institution
University of Maryland, College Park
Department
Computer Science
University location
United States -- Maryland
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3372850
ProQuest document ID
304926719
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/304926719
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.