A sensory -motor linguistic framework for human activity understanding
We empirically discovered that the space of human actions has a linguistic structure. This is a sensory-motor space consisting of the evolution of joint angles of the human body in movement. The space of human activity has its own phonemes, morphemes, and sentences. We present a Human Activity Language (HAL) for symbolic non-arbitrary representation of sensory and motor information of human activity. This language was learned from large amounts of motion capture data.
Kinetology, the phonology of human movement, finds basic primitives for human motion (segmentation) and associates them with symbols (symbolization). This way, kinetology provides a symbolic representation for human movement that allows synthesis, analysis, and symbolic manipulation. We introduce a kinetological system and propose five basic principles on which such a system should be based: compactness, view-invariance, reproducibility, selectivity, and reconstructivity. We demonstrate the kinetological properties of our sensory-motor primitives. Further evaluation is accomplished with experiments on compression and decompression of motion data.
The morphology of a human action relates to the inference of essential parts of movement (morpho-kinetology) and its structure (morpho-syntax). To learn morphemes and their structure, we present a grammatical inference methodology and introduce a parallel learning algorithm to induce a grammar system representing a single action. The algorithm infers components of the grammar system as a subset of essential actuators, a CFG grammar for the language of each component representing the motion pattern performed in a single actuator, and synchronization rules modeling coordination among actuators.
The syntax of human activities involves the construction of sentences using action morphemes. A sentence may range from a single action morpheme (nuclear syntax) to a sequence of sets of morphemes. A single morpheme is decomposed into analogs of lexical categories: nouns, adjectives, verbs, and adverbs. The sets of morphemes represent simultaneous actions (parallel syntax) and a sequence of movements is related to the concatenation of activities (sequential syntax).
We demonstrate this linguistic framework on real motion capture data from a large scale database containing around 200 different actions corresponding to English verbs associated with voluntary meaningful observable movement.