So, I have run into an interesting problem. In a LALR(1) Parser, how do you tell the difference between / for divide and / for starting a regular expression?
The lexer needs to be able to see the difference between these two operations, so, maybe its not the parsers fault? This would require flex (the lexer I am currently using) to be aware of the context it is in. It does support state based context, but the lexer has to be able to enter and exit the context. It is not callable from the parser generator. So, now I have a problem. Either I make the lexer do really complicated things, or I have a different kind of parser. Perl 5 has a bison based grammar but, its lexer is not at all sane. It has to do some tricky things to look around and find when a / is a regex vs a divide. A simple example of this might be 5 / 5 / 5; If your parser/lexer is not very smart, that might seem like an integer followed by a regular expression followed by an integer. Or is it an integer divided by an integer divided by an integer? Well, I know thats a divide not an regular expression. But its proving difficult, in my limited knowledge of bison and flex, to make bison and flex understand this.
So, talking to some of the folks in the #perl6 IRC channel, it seems nqp and the STD.pm for perl 6 both use a LL strategy for parsing content. Bison produces a LALR(1) style parser, which is a look-ahead left recursion parser. Left recursion produces a bottom-up parser. A LALR(k) or GLR parser would also give me some trouble, since I would have less knowledge of what I am parsing coming from the bottom-up. That doesn’t mean its impossible to make a bottom-up parser for NQP and Perl 6 but it seems more difficult.
I may not understand all the parser details as much as I should, but it seems to me that for now I am going to come up with a LL style parser strategy.