Abstract/Details

Compression of input data in machine learning from examples

Than, Soe.   University of Kansas ProQuest Dissertations Publishing,  1994. 9508681.

Abstract (summary)

Compression of data is a useful way to economize on computing resources. In machine learning from examples, input data is in the form of a decision table. Generally, a decision table consists of a set of examples, a set of attributes, a decision variable, a set of values, and a function to return a value for each pair of an example and an attribute or the decision. Naturally, large decision tables make learning difficult. This paper presents a methodology to compress a decision table using a triple of partitions on the sets of examples, attributes, and attribute values, respectively.

The compression is accomplished by grouping together examples, attributes, and attribute values into blocks such that blocks of examples and blocks of attributes are transformed into blocks of attribute values the same way their members are transformed in the original decision table. Thus the original decision table is compressed into a smaller decision table while preserving the original structure.

Theorems on the underlying algebraic structures of the partition triples and of a special type of partition triple, called MMm triple, are presented. An algorithm to identify all MMm triples of the decision table is developed. In general, a large number of MMm triples are found by the algorithm. Heuristics for finding some good MMm triples have been discussed. Quality of the rules induced from the compressed decision table is analyzed. Experiments are conducted on a real-life decision table from breast cancer domain and rules induced from the compressed decision tables are found to be simpler than the rules induced from the original decision table.

Indexing (details)


Subject
Computer science
Classification
0984: Computer science
Identifier / keyword
Applied sciences
Title
Compression of input data in machine learning from examples
Author
Than, Soe
Number of pages
101
Degree date
1994
School code
0099
Source
DAI-B 55/11, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
979-8-208-80716-3
University/institution
University of Kansas
University location
United States -- Kansas
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
9508681
ProQuest document ID
304140404
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/docview/304140404