Ecological inference problem revisited: Statistical modeling in a not so ideal world

2005 2005

Other formats: Order a copy

Abstract (summary)

The problem of ecological inference arises when drawing conclusions about individual behavior from aggregate data. Such a situation is frequently encountered in the social sciences and epidemiology. In this dissertation, we propose an incomplete data framework for ecological inference. We also present various models for ecological inference problem in 2 x 2 tables using the data augmentation approach. Followed by a brief introduction, this dissertation consists of three interrelated chapters on the incomplete data approach to the ecological inference problem.

In Chapter 2, we formulate the ecological inference problem in 2 x 2 tables as an incomplete data problem where there is no contextual effect. This framework directly incorporates the deterministic bounds, which contain all the information available from the data. A parametric model is developed that can incorporate covariates and individual-level data. Then I propose a nonparametric model using a Dirichlet process prior that avoids distributional assumptions. This model relaxes the arbitrary distributional assumption. Finally, through simulations and an empirical application, we evaluate the performance of these models in comparison with existing methods.

In Chapter 3; we formally define the ecological inference problem as a coarse data problem. The related assumptions and theoretical results are applied to the ecological inference problem. In particular, one can identify three key issues affecting ecological Inference under this framework-distributional, contextual and aggregation effects. Through the use of an EM algorithm and its extension, the model can formally quantify the effect of missing information due to aggregation. Then I extend the models proposed in the first paper to incorporate conditions when the data are not coarsened at random. By controlling the coarsening process, one can make valid ecological inference. The chapter concludes with simulations and empirical applications that assess the model performance.

Finally, in Chapter 4, we discuss the computational details for fitting the models that are introduced in previous chapters. We use Markov Chain Monte Carlo algorithms to estimate the Bayesian models. In particular, we illustrate the Gibbs samplers that are used for posterior simulation and inference for various Bayesian models. In addition, we develop the EM and SEM algorithms to compute the maximum likelihood estimates. In the end, we present a publicly available R package, eco, that implements these estimation procedures along with the software manual.

Indexing (details)

Political science;
Social research;
0615: Political science
0344: Social research
0463: Statistics
Identifier / keyword
Social sciences; Pure sciences; Data augmentation; Ecological inference; Incomplete data; Statistical modeling
Ecological inference problem revisited: Statistical modeling in a not so ideal world
Lu, Ying
Number of pages
Publication year
Degree date
School code
DAI-A 66/03, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
9780542059865, 054205986X
Imai, Kosuke
Princeton University
University location
United States -- New Jersey
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.