A statistical approach to improving accuracy in classifier ensembles

2008 2008

Other formats: Order a copy

Abstract (summary)

Popular ensemble classifier induction algorithms, such as bagging and boosting, construct the ensemble by optimizing component classifiers in isolation. The controllable degrees of freedom in an ensemble include the instance selection and feature selection for each component classifier. Because their degrees of freedom are uncoupled, the component classifiers are not built to optimize performance of the ensemble, rather they are constructed by minimizing individual training loss. Recent work in the ensemble literature contradicts the notion that a combination of the best individually performing classifiers results in lower ensemble error rates. Zenobi et al. demonstrated that ensemble construction should consider a classifier's contribution to ensemble accuracy and diversity even at the expense of individual classifier performance. To tradeoff individual accuracy against ensemble accuracy and diversity, a component classifier inducer requires knowledge of the choices made by the other ensemble members.

We introduce an approach, called DiSCO, that exercises direct control over the tradeoff between diversity and error by sharing ensemble-wide information on instance selection during training. A classifier's contribution to ensemble accuracy and diversity can be measured as it is constructed in isolation, but without sharing information among its peers in the ensemble during training, nothing can be done to control it. In this work, we explore a method for training the component classifiers collectively by sharing information about training set selection. This allows our algorithm to build ensembles whose component classifiers select complementary error distributions that maximize diversity while minimizing ensemble error directly. Treating ensemble construction as an optimization problem, we explore approaches using local search, global search and stochastic methods.

Using this approach we can improve ensemble classifier accuracy over bagging and boosting on a variety of data, particularly those for which the classes are moderately overlapping. In ensemble classification research, how to use diversity to build effective classifier teams is an open question. We also provide a method that uses entropy as a measure of diversity to train an ensemble classifier.

Indexing (details)

Artificial intelligence;
Computer science
0800: Artificial intelligence
0984: Computer science
Identifier / keyword
Applied sciences; Ensemble classifiers; Machine learning
A statistical approach to improving accuracy in classifier ensembles
Holness, Gary F.
Number of pages
Publication year
Degree date
School code
DAI-B 69/12, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Utgoff, Paul E.
Committee member
Learned-Miller, Erik; Mahadevan, Sridhar; Staudenmayer, John
University of Massachusetts Amherst
Computer Science
University location
United States -- Massachusetts
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.