Abstract/Details

Approximation algorithms for grammar -based data compression

Lehman, Eric Allen.   Massachusetts Institute of Technology ProQuest Dissertations Publishing,  2002. 0804007.

Abstract (summary)

This thesis considers the smallest grammar problem: find the smallest context-free grammar that generates exactly one given string. We show that this problem is intractable, and so our objective is to find approximation algorithms. This simple question is connected to many areas of research. Most importantly, there is a link to data compression; instead of storing a long string, one can store a small grammar that generates it. A small grammar for a string also naturally brings out underlying patterns, a fact that is useful, for example, in DNA analysis. Moreover, the size of the smallest context-free grammar generating a string can be regarded as a computable relaxation of Kolmogorov complexity. Finally, work on the smallest grammar problem qualitatively extends the study of approximation algorithms to hierarchically-structured objects. In this thesis, we establish hardness results, evaluate several previously proposed algorithms, and then present new procedures with much stronger approximation guarantees. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Indexing (details)


Subject
Computer science
Classification
0984: Computer science
Identifier / keyword
Applied sciences; Approximation algorithms; Data compression; Grammar-based
Title
Approximation algorithms for grammar -based data compression
Author
Lehman, Eric Allen
Number of pages
0
Degree date
2002
School code
0753
Source
DAI-B 63/07, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
Advisor
Sudan, Madhu
University/institution
Massachusetts Institute of Technology
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
0804007
ProQuest document ID
305445788
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/docview/305445788