Improving access to organized information

2006 2006

Other formats: Order a copy

Abstract (summary)

We introduce several new models and methods for improving access to organized information. The first model, Constrained Subtree Selection (CSS), has applications in web site design and the reorganization of directory structures. Given a hierarchy represented as a rooted DAG G with n weighted leaves, one selects a subtree of the transitive closure of G that minimizes the expected path cost. Path cost is the sum of the degree costs along a path from the root to a leaf. Degree cost, γ, is a function of the out degree of a node. We give a sufficient condition for γ that makes CSS NP-complete. This result holds even when the leaves have equal weight. Turning to algorithms, we give a polynomial time solution for instances of CSS where G does not constrain the choice of subtrees and γ favors nodes with at most k links. Even though CSS remains NP-hard for constant degree DAGs, we give an O(log(k)γ(d+1)) approximation for any G with maximum degree d, provided that γ favors nodes with at most k links. Finally, we give a complete characterization of the optimal trees for two special cases: (1) linear degree cost in unconstrained graphs and uniform probability distributions, and (2) logarithmic degree cost in arbitrary DAGs and uniform probability distributions.

The second problem, Category Tree (CT), seeks a decision tree for categorical data where internal nodes are categories, edges are appropriate values for the categories, and leaves are data items. CT generalizes the well-studied Decision Tree (DT) problem. Our results resolve two open problems: We give a ln n + 1-approximation for DT and show DT does not have a polynomial time approximation scheme unless P=NP. Our work, while providing the first non-trivial upper and lower bounds on approximating DT, also demonstrates that DT and a subtly different problem which also bears the name decision tree have fundamentally different approximation complexity.

We complement the above models with a new pruning method for k nearest neighbor queries on R-trees. We show that an extension to a popular depth-first 1-nearest neighbor query results in a theoretically better search. We call this extension Promise-Pruning and construct a class of R-trees where its application reduces the search space exponentially.

Indexing (details)

Computer science
0984: Computer science
Identifier / keyword
Applied sciences; Approximations; Data structures; Dynamic organization; Information
Improving access to organized information
Heeringa, Brent
Number of pages
Publication year
Degree date
School code
DAI-B 67/11, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Adler, Micah
University of Massachusetts Amherst
University location
United States -- Massachusetts
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.