# Abstract/Details

## Statistical monitoring and cluster detection under naturally occurring heterogeneous dichotomous events

2011 2011

### Abstract (summary)

Many processes produce a count statistic that is a sum of multiple non-homogeneous dichotomous random variables, that is, with different values of the Bernoulli parameter *p.* The probability distribution of this count statistic is the convolution of *J* non-identical binomial distributions and can significantly differ from its binomial and normal counterparts. In such cases the homogeneity assumption can result in incorrect probability calculations and conclusions from statistical procedures such as control charts, sequential probability ratio tests, and cluster detection via scan statistics. Use of the exact (*J*-binomial) distribution, however, can require prohibitively exhausting calculations as the number (*J*) of non-identical binomial random variables in the convolution increases.

Following the above motivations, this dissertation has three foci: The first is testing and monitoring heterogeneous processes over time. Risk-adjusted sequential probability ratio tests (SPRTs) and resetting SPRT charts are derived, their accuracy and detection performances (average run lengths and operating characteristic curves) are compared to those assuming homogeneity, and shown to be significantly better in some applications.

The second focus area is detection of geographical clusters via scan statistics in the presence of natural heterogeneity. Two risk-adjusted models of Kulldorff's Bernoulli scan statistic, based on the product of risk-adjusted probabilities (*J*-Bernoulli model) and the distribution of heterogeneity (*J*-binomial model) are developed and their comparative performance versus the conventional method is explored.

Monte Carlo performance analyses show that the risk-adjusted models lead to better inferences, detection times, and probabilities over a variety of scenarios provide insights for the selection and use of correct methodologies under the occurrence of heterogeneous dichotomous events.

The third problem addresses computation issues of *J*-binomial distributions. Computing these probabilities is important in many applications, especially since the above mentioned methods each require tens to thousands of *J*-binomial probability calculations. The accuracy of * J*-binomial probability estimations via a cumulant based expansion that use orthogonal polynomials and saddle point approximations is explored by comparison to both exact and Monte Carlo estimations (MCE) of probabilities. A normalized Gram-Charlier expansion (NGCE) and saddle point approximations are shown to produce the most accurate results and to be more time-efficient than computing the exact probabilities or the MCE. The NGCE algorithm is practical, known to produce an estimate under all scenarios, and of great value to analysts since it easily can be integrated into computer codes.

### Indexing (details)

Industrial engineering

0546: Industrial engineering