Abstract/Details

Pachinko allocation: DAG-structured mixture models of topic correlations


2007 2007

Other formats: Order a copy

Abstract (summary)

Statistical topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, the majority of existing approaches capture no or limited correlations between topics. We propose the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). We present various structures within this framework, different parameterizations of topic distributions, and an extension to capture dynamic patterns of topic correlations. We also introduce a non-parametric Bayesian prior to automatically learn the topic structure from data. The model is evaluated on document classification, likelihood of held-out data, the ability to support fine-grained topics, and topical keyword coherence. With a highly-scalable approximation, PAM has also been applied to discover topic hierarchies in very large datasets.

Indexing (details)


Subject
Statistics;
Artificial intelligence;
Computer science
Classification
0463: Statistics
0800: Artificial intelligence
0984: Computer science
Identifier / keyword
Applied sciences; Pure sciences; Directed acyclic graph; PAM; Pachinko allocation; Probabilistic models; Topic modeling
Title
Pachinko allocation: DAG-structured mixture models of topic correlations
Author
Li, Wei
Number of pages
100
Publication year
2007
Degree date
2007
School code
0118
Source
DAI-B 68/11, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780549330233
Advisor
McCallum, Andrew
Committee member
Blei, David; Croft, W. Bruce; Mahadevan, Sridhar; Staudenmayer, John
University/institution
University of Massachusetts Amherst
Department
Computer Science
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3289214
ProQuest document ID
304846699
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/304846699
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.