-
Notifications
You must be signed in to change notification settings - Fork 417
Public API for accessing end position and raw match in parser actions #131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do I understand correctly that you need both structured values and the raw text? Meaning the new |
Correct, although the end position is probably more important than the raw text (especially since substring can produce the raw string given the [start,end] position and input). CoffeeScriptRedux keeps the start and end position of each AST node to generate source maps. Right now, each rule's action concatenates together all of the subexpressions' raw text (even ignored whitespace!) to obtain the raw text. The lengths of the raw strings and start offsets are used to calculate the end positions. I'm not sure the raw text is actually used for anything besides that and making the parse tree more human-readable for debugging. Grammar source is here, and you can see that a large portion of it is dealing with raw values, which I find a little silly. (@michaelficarra is the main developer; I've been contributing to other parts and looking on at the parser with confusion) |
What an amazing coincidence. I just recently forked PEGjs and was working on this exact issue. This would be extremely useful for cleaning up my grammar, as you can see. +1. edit: Not that amazing. I just noticed I had opened an issue regarding this. |
If the raw text is exposed, the end position can be easily obtained by I personally feel exposing the matched text makes more sense than exposing the end position. Probably |
I agree. I'll push a patch for that in a minute (this was a quick fix). |
CoffeeScriptRedux currently obtains the raw match string by tediously concatenating together all the subexpressions, but this could easily be handled by pegjs.
It's already possible by accessing the
pos
andinput
variables that happen to be in lexical scope:Which parses
(a (b (c)))
:It would be really easy to wrap this in a function like the new
offset()
for future-compatibility and consistency. I see there's a code generator rewrite coming, otherwise this would be a pull request...The text was updated successfully, but these errors were encountered: