Nonparametric methods for learning from data

2006 2006

Other formats: Order a copy

Abstract (summary)

Developing statistical machine learning algorithms involves making various degrees of assumptions about the nature of the data being modeled. Non-parametric methods are useful when prior information regarding the parametric form of the model is unavailable or invalid. This thesis presents non-parametric methods for tackling various modeling requirements.

The first part of this thesis presents a pair of unsupervised and supervised linear dimensionality reduction techniques that are suitable for various data types like binary and integer along with real-valued data. They are based on a semi-parametric mixture of exponential family distributions where no parametric assumptions are made about the latent distribution and the parametric form of the noise distribution is to be chosen based on the data type, for example Bernoulli for binary data, etc. A key feature of the unsupervised method is that it guarantees asymptotic consistency of the estimated lower dimensional signal subspace, which is not guaranteed for other recently proposed methods. The supervised method finds the lower dimensional space that retains maximum possible information regarding the labels. We present efficient algorithms and experimental results for these methods.

The second part of this thesis considers unsupervised learning of a density based distance. We decompose the errors that can arise in approximating these density based distances into estimation and computation components. We prove upper and lower bounds on the rate of convergence of the estimation error in terms of data dimensionality and smoothness of the data density. We present a method for constructing a graph on the data and a performance guarantee on the computation error when using this method. We also show an upper bound on the approximation error that applies to approximating distances using nearest-neighborhood based graphs and is applicable to several other similarity measuring algorithms. Finally, we show that this graph construction enables consistent approximation of the minimal geodesics themselves for the non-linear interpolation application and present experimental results.

Indexing (details)

Electrical engineering;
Artificial intelligence
0544: Electrical engineering
0800: Artificial intelligence
Identifier / keyword
Applied sciences; Dimensionality reduction; Learning; Machine learning; Nonparametric
Nonparametric methods for learning from data
Number of pages
Publication year
Degree date
School code
DAI-B 67/01, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
9780542510861, 0542510863
Orlitsky, Alon
University of California, San Diego
University location
United States -- California
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.