PSLR(1): Pseudo-scannerless minimal LR(1) for the deterministic parsing of composite languages
Composite languages are composed of multiple sub-languages. Examples include the parser specification languages read by parser generators like Yacc, modern extensible languages with complex layers of domain-specific sub-languages, and even traditional programming languages like C and C++. In this dissertation, we describe PSLR(1), a new scanner-based LR(1) parser generation system that automatically eliminates scanner conflicts typically caused by language composition. The fundamental premise of PSLR(1) is the pseudo-scanner, a scanner that only recognizes tokens accepted by the current parser state. However, use of the pseudo-scanner raises several unique challenges, for which we describe a novel set of solutions. One major challenge is that practical LR(1) parser table generation algorithms merge parser states, sometimes inducing incorrect pseudoscanner behavior including new conflicts. Our solution is a new extension of IELR(1), an algorithm we have previously described for generating minimal LR(1) parser tables. Other contributions of our work include a robust system for handling the remaining scanner conflicts, a correction for syntax error handling mechanisms that are also corrupted by parser state merging, and a mechanism to enable scoping of syntactic declarations in order to further improve the modularity of sub-language specifications. While the premise of the pseudo-scanner has been described by other researchers independently, we expect our improvements to distinguish PSLR(1) as a significantly more robust scanner-based parser generation system for traditional and modern composite languages.