Abstract/Details

Optimized data compression in the absence of a priori probabilities

Hatton, Edward Douglas.   University of Waterloo (Canada) ProQuest Dissertations Publishing,  1995. NN99808.

Abstract (summary)

This thesis considers the problem of lossless data compression of a finite-length sample from a finite-alphabet random source with unknown symbol probabilities. Under conditions of stationarity, it is demonstrated that there exists an optimal symbol-probability estimate given only the counts of the previously observed symbols and the symbol-alphabet size. It is demonstrated that the traditional measure of storage requirements using Shannon's entropy measure, computed using the relative frequencies of the possible characters, is insufficient in those cases where the distribution of characters is not known in advance and will only asymptotically approach the true per-symbol storage requirements. The absolute error is O(log(N)) where N is the number of characters stored in the particular context. This implies that a variable-order Markov compression model is in general more space efficient than a fixed-length Markov process. The problem of removing memory from a source through a variable-order Markov model is also addressed. An impractical, but optimal, fully dynamic Markov coder is described but not implemented due to the computational complexity of the algorithm for even simple sources. The dynamic symbol probability estimate is combined with a semi-adaptive model in a proposed compression scheme. A noiseless data compression algorithm is derived that utilizes adaptive probability estimates for the outcome characters within a conditioning class but uses a semi-adaptive construct for the model structure. A mechanism is also created by which a locally optimal context order may be derived, which will benefit data with fixed-length records or that are more than one-dimensional in nature. The resulting family of compression schemes, called SAMC is compared to PPMC and other earlier schemes, and it is found to provide the best compression rates for all of the test files chosen.

Indexing (details)


Subject
Electrical engineering
Classification
0544: Electrical engineering
Identifier / keyword
Applied sciences; compression scheme
Title
Optimized data compression in the absence of a priori probabilities
Author
Hatton, Edward Douglas
Number of pages
141
Degree date
1995
School code
1141
Source
DAI-B 56/10, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
978-0-315-99808-7
Advisor
Freeman, George H.
University/institution
University of Waterloo (Canada)
University location
Canada -- Ontario, CA
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
NN99808
ProQuest document ID
304299385
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/docview/304299385