# Abstract/Details

## Improving access to organized information

2006 2006

### Abstract (summary)

We introduce several new models and methods for improving access to organized information. The first model, Constrained Subtree Selection (CSS), has applications in web site design and the reorganization of directory structures. Given a hierarchy represented as a rooted DAG *G* with *n* weighted leaves, one selects a subtree of the transitive closure of * G* that minimizes the expected path cost. Path cost is the sum of the degree costs along a path from the root to a leaf. Degree cost, γ, is a function of the out degree of a node. We give a sufficient condition for γ that makes CSS **NP**-complete. This result holds even when the leaves have equal weight. Turning to algorithms, we give a polynomial time solution for instances of CSS where *G* does not constrain the choice of subtrees and γ favors nodes with at most *k* links. Even though CSS remains **NP**-hard for constant degree DAGs, we give an *O*(log(*k*)γ(d+1)) approximation for any *G* with maximum degree *d*, provided that γ favors nodes with at most *k* links. Finally, we give a complete characterization of the optimal trees for two special cases: (1) linear degree cost in unconstrained graphs and uniform probability distributions, and (2) logarithmic degree cost in arbitrary DAGs and uniform probability distributions.

The second problem, Category Tree (CT), seeks a decision tree for categorical data where internal nodes are categories, edges are appropriate values for the categories, and leaves are data items. CT generalizes the well-studied Decision Tree (DT) problem. Our results resolve two open problems: We give a ln *n* + 1-approximation for DT and show DT does not have a polynomial time approximation scheme unless P=NP. Our work, while providing the first non-trivial upper and lower bounds on approximating DT, also demonstrates that DT and a subtly different problem which also bears the name decision tree have fundamentally different approximation complexity.

We complement the above models with a new pruning method for k nearest neighbor queries on R-trees. We show that an extension to a popular depth-first 1-nearest neighbor query results in a theoretically better search. We call this extension Promise-Pruning and construct a class of R-trees where its application reduces the search space exponentially.