Public API for accessing end position and raw match in parser actions #131

kevinmehall · 2012-12-05T03:38:11Z

CoffeeScriptRedux currently obtains the raw match string by tediously concatenating together all the subexpressions, but this could easily be handled by pegjs.

It's already possible by accessing the pos and input variables that happen to be in lexical scope:

main = expr
expr = "(" fn:[abcd] subexpr:((" " e:expr){return e})? ")"
  { return {start:offset, end:pos,
            raw:input.substring(offset, pos),
            fn:fn, subexpr:subexpr};
  }

Which parses (a (b (c))):

{ start: 0,
  end: 11,
  raw: '(a (b (c)))',
  fn: 'a',
  subexpr: 
   { start: 3,
     end: 10,
     raw: '(b (c))',
     fn: 'b',
     subexpr: 
      { start: 6,
        end: 9,
        raw: '(c)',
        fn: 'c',
        subexpr: '' } } }

It would be really easy to wrap this in a function like the new offset() for future-compatibility and consistency. I see there's a code generator rewrite coming, otherwise this would be a pull request...

The text was updated successfully, but these errors were encountered:

dmajda · 2012-12-05T07:41:09Z

Do I understand correctly that you need both structured values and the raw text? Meaning the new $ operator isn't enough for your purposes?

kevinmehall · 2012-12-05T08:27:23Z

Correct, although the end position is probably more important than the raw text (especially since substring can produce the raw string given the [start,end] position and input).

CoffeeScriptRedux keeps the start and end position of each AST node to generate source maps. Right now, each rule's action concatenates together all of the subexpressions' raw text (even ignored whitespace!) to obtain the raw text. The lengths of the raw strings and start offsets are used to calculate the end positions. I'm not sure the raw text is actually used for anything besides that and making the parse tree more human-readable for debugging.

Grammar source is here, and you can see that a large portion of it is dealing with raw values, which I find a little silly. (@michaelficarra is the main developer; I've been contributing to other parts and looking on at the parser with confusion)

michaelficarra · 2012-12-05T20:21:25Z

What an amazing coincidence. I just recently forked PEGjs and was working on this exact issue. This would be extremely useful for cleaning up my grammar, as you can see. +1.

edit: Not that amazing. I just noticed I had opened an issue regarding this.

curvedmark · 2012-12-10T08:40:10Z

Correct, although the end position is probably more important than the raw text (especially since substring can produce the raw string given the [start,end] position and input).

If the raw text is exposed, the end position can be easily obtained by offset + raw.length.

I personally feel exposing the matched text makes more sense than exposing the end position. Probably raw() or string() or whatever to the action.

dmajda · 2012-12-10T20:09:22Z

I personally feel exposing the matched text makes more sense than exposing the end position. Probably raw() or string() or whatever to the action.

I agree. I'll push a patch for that in a minute (this was a quick fix).

kevinmehall mentioned this issue Dec 5, 2012

push raw parse value preservation into pegjs michaelficarra/CoffeeScriptRedux#108

Closed

dmajda closed this as completed in bea6b1f Dec 10, 2012

ghost assigned dmajda Dec 10, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public API for accessing end position and raw match in parser actions #131

Public API for accessing end position and raw match in parser actions #131

kevinmehall commented Dec 5, 2012

dmajda commented Dec 5, 2012

kevinmehall commented Dec 5, 2012

michaelficarra commented Dec 5, 2012

curvedmark commented Dec 10, 2012

dmajda commented Dec 10, 2012

Public API for accessing end position and raw match in parser actions #131

Public API for accessing end position and raw match in parser actions #131

Comments

kevinmehall commented Dec 5, 2012

dmajda commented Dec 5, 2012

kevinmehall commented Dec 5, 2012

michaelficarra commented Dec 5, 2012

curvedmark commented Dec 10, 2012

dmajda commented Dec 10, 2012