Abstract/Details

Statistical models and analysis techniques for learning in relational data


2006 2006

Other formats: Order a copy

Abstract (summary)

Many data sets routinely captured by organizations are relational in nature---from marketing and sales transactions, to scientific observations and medical records. Relational data record characteristics of heterogeneous objects and persistent relationships among those objects (e.g., citation graphs, the World Wide Web, genomic structures). These data offer unique opportunities to improve model accuracy, and thereby decision-making, if machine learning techniques can effectively exploit the relational information.

This work focuses on how to learn accurate statistical models of complex, relational data sets and develops two novel probabilistic models to represent, learn, and reason about statistical dependencies in these data. Relational dependency networks are the first relational model capable of learning general autocorrelation dependencies, an important class of statistical dependencies that are ubiquitous in relational data. Latent group models are the first relational model to generalize about the properties of underlying group structures to improve inference accuracy and efficiency. Not only do these two models offer performance gains over current relational models, but they also offer efficiency gains which will make relational modeling feasible for large, relational datasets where current methods are computationally intensive, if not intractable.

We also formulate of a novel analysis framework to analyze relational model performance and ascribe errors to model learning and inference procedures. Within this framework, we explore the effects of data characteristics and representation choices on inference accuracy and investigate the mechanisms behind model performance. In particular, we show that the inference process in relational models can be a significant source of error and that relative model performance varies significantly across different types of relational data.

Indexing (details)


Subject
Computer science
Classification
0984: Computer science
Identifier / keyword
Applied sciences, Data mining, Learning, Machine learning, Relational data
Title
Statistical models and analysis techniques for learning in relational data
Author
Neville, Jennifer
Number of pages
155
Publication year
2006
Degree date
2006
School code
0118
Source
DAI-B 67/11, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780542977909
Advisor
Jensen, David
University/institution
University of Massachusetts Amherst
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3242344
ProQuest document ID
305311615
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/305311615
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.