Mathematical and computational aspects of lexicalized grammars
Most current linguistic theories give lexical accounts of several phenomena that used to be considered purely syntactic. The information put in the lexicon is thereby increased both in amount and complexity. We explore the view that syntactic rules are not separated from lexical items. In this approach, each elementary structure is associated with a lexical item called the anchor. These structures specify extended domains of locality (as compared to context-free grammars) over which constraints can be stated. The 'grammar' consists of a lexicon where each lexical item is associated with a finite number of structures for which that item is the anchor. There are 'rules' which tell us how these structures are composed. A grammar of this form will be said to be lexicalized.
The process of lexicalization of context-free grammars (CFGs) constrained by linguistic requirements forces us to use operations for combining structures that make the formalism fall in the class of mildly context sensitive languages. We show that substitution, the combining operation corresponding to CFGs, does not allow one to lexicalize CFGs but the combination of substitution and adjunction does. We show how tree-adjoining grammar (TAG) is derived from the lexicalization process of CFGs. Then we show that TAGs are closed under lexicalization and we illustrate the main structures found in a lexicalized TAG for English. The properties of TAGs permit us to encapsulate diverse syntactic phenomena in a very natural way. TAG's extended domain of locality and its factoring of recursion from local dependencies enable us to localize many syntactic dependencies (such as filler-gap) as well as semantic dependencies (such as predicate-arguments).
We investigate the processing of lexicalized TAGs. We first present two general practical parsers that follow Earley-style parsing. They are practical parsers for TAGs because, as for CFGs, the average behavior of Earley-type parsers is superior to its worst case complexity. They are both left to right bottom-up parsers that use top-down predictions but they differ in the way the top down prediction is used.
Then we explain the building of a set of deterministic bottom-up left to right parsers which analyze a subset of tree-adjoining languages. The LR parsing strategy for CFGs is extended to TAG by using a machine, called Bottom-up Embedded Push Down Automaton (BEPDA), that recognizes in a bottom-up fashion the set of tree-adjoining languages (and exactly this set).
Finally we show how lexicalized grammars suggest a natural two-step parsing strategy. We consider lexicalized TAGs as an instance of lexicalized grammar and we examine the effect of the two-step parsing strategy on main types of parsing algorithms.