-
Notifications
You must be signed in to change notification settings - Fork 417
Redesign error reporting #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It would be better if multiple invocations of the error function lead to multiple errors. You could then continue parsing as much as possible.
I would also recommend adding support for warnings as well. |
Could you name some use case(s) where you would need this? It seems to me that the example you provided would work with my proposal too. I already have some use cases, but I don't know how representative they are, so I'd like to see some more.
I am also thinking about it. A problem with multiple errors and warnings is that they would need a different interface than simple and intuitive “ What kind of API would you find most inutitive here? Again, I have some ideas, but I'd like to see what others think. |
My primary use case is for non-fatal parse errors. Non-terminated strings, identifiers that start with a number, nested comments, missing semicolons, etc. Generally, it is best to let the parser advance as far as possible, so that as many errors can be shown to the user as possible. If the parser could even finish the parse, and let the next steps (e.g. compilation) report errors as well, that would be great. |
Before this commit, the |expected| property of an exception object thrown when a generated parser encountered an error contained expectations as strings. These strings were in a human-readable format suitable for displaying in the UI but not suitable for machine processing. For example, expected string literals included quotes and a string "any character" was used when any character was expected. This commit makes expectations structured objects. This makes the machine processing easier, while still allowing to generate a human-readable representation if needed. Implements part of #198. Speed impact ------------ Before: 1180.41 kB/s After: 1165.31 kB/s Difference: -1.28% Size impact ----------- Before: 863523 b After: 950817 b Difference: 10.10% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
Using a special value to indicate match failure instead of |null| allows actions to return |null| as a regular value. This simplifies e.g. the JSON parser. Note the special value is internal and intentionally undocumented. This means that there is currently no official way how to trigger a match failure from an action. This is a temporary state which will be fixed soon. The negative performance impact (see below) is probably caused by changing lot of comparisons against |null| (which likely check the value against a fixed constant representing |null| in the interpreter) to comparisons against the special value (which likely check the value against another value in the interpreter). Implements part of #198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
Before this commit, the |?| operator returned an empty string upon unsuccessful match. This commit changes the returned value to |null|. It also updates the PEG.js grammar and the example grammars, which used the value returned by |?| quite often. Returning |null| is possible because it no longer indicates a match failure. I expect that this change will simplify many real-world grammars, as an empty string is almost never desirable as a return value (except some lexer-level rules) and it is often translated into |null| or some other value in action code. Implements part of #198.
After making the |?| operator return |null| instead of an empty string in the previous commit, empty strings were still returned from predicates. This didn't make much sense. Return value of a predicate is unimportant (if you have one in hand, you already know the predicate succeeded) and one could even argue that predicates shouldn't return any value at all. The closest thing to "return no value" in JavaScript is returning |undefined|, so I decided to make predicates return exactly that. Implements part of #198.
The |expected| function allows users to report regular match failures inside actions. If the |expected| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be in the usual "Expected ... but found ..." format with the description specified in the |expected| call used as part of the message. Implements part of #198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
This is in anticipation of |peg$error|. The |peg$expected| and |peg$error| internal functions will nicely mirror the |expected| and |error| functions available to user code in actions. Implements part of #198.
The exception-creating code will get somewhat hairy soon, so let's make sure them mess will be contained. Implements part of #198.
It will be possible to create errors with user-supplied messages soon. The |SyntaxError| class needs to be ready for that. Implements part of #198.
The |error| function allows users to report custom match failures inside actions. If the |error| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be exactly the one specified in the |error| call. Implements part of #198. Speed impact ------------ Before: 999.83 kB/s After: 1000.84 kB/s Difference: 0.10% Size impact ----------- Before: 1017212 b After: 1019968 b Difference: 0.27% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
This is now implemented. The
I think that 12.62% speed penalty and 18.11% size penalty is fine for resolving such a long-standing set of problems. Closing. |
@dmajda: That's great news! I'm so glad |
Before this commit, the |expected| property of an exception object thrown when a generated parser encountered an error contained expectations as strings. These strings were in a human-readable format suitable for displaying in the UI but not suitable for machine processing. For example, expected string literals included quotes and a string "any character" was used when any character was expected. This commit makes expectations structured objects. This makes the machine processing easier, while still allowing to generate a human-readable representation if needed. Implements part of pegjs#198. Speed impact ------------ Before: 1180.41 kB/s After: 1165.31 kB/s Difference: -1.28% Size impact ----------- Before: 863523 b After: 950817 b Difference: 10.10% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
Using a special value to indicate match failure instead of |null| allows actions to return |null| as a regular value. This simplifies e.g. the JSON parser. Note the special value is internal and intentionally undocumented. This means that there is currently no official way how to trigger a match failure from an action. This is a temporary state which will be fixed soon. The negative performance impact (see below) is probably caused by changing lot of comparisons against |null| (which likely check the value against a fixed constant representing |null| in the interpreter) to comparisons against the special value (which likely check the value against another value in the interpreter). Implements part of pegjs#198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
Before this commit, the |?| operator returned an empty string upon unsuccessful match. This commit changes the returned value to |null|. It also updates the PEG.js grammar and the example grammars, which used the value returned by |?| quite often. Returning |null| is possible because it no longer indicates a match failure. I expect that this change will simplify many real-world grammars, as an empty string is almost never desirable as a return value (except some lexer-level rules) and it is often translated into |null| or some other value in action code. Implements part of pegjs#198.
After making the |?| operator return |null| instead of an empty string in the previous commit, empty strings were still returned from predicates. This didn't make much sense. Return value of a predicate is unimportant (if you have one in hand, you already know the predicate succeeded) and one could even argue that predicates shouldn't return any value at all. The closest thing to "return no value" in JavaScript is returning |undefined|, so I decided to make predicates return exactly that. Implements part of pegjs#198.
The |expected| function allows users to report regular match failures inside actions. If the |expected| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be in the usual "Expected ... but found ..." format with the description specified in the |expected| call used as part of the message. Implements part of pegjs#198. Speed impact ------------ Before: 1146.82 kB/s After: 1031.25 kB/s Difference: -10.08% Size impact ----------- Before: 950817 b After: 973269 b Difference: 2.36% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
This is in anticipation of |peg$error|. The |peg$expected| and |peg$error| internal functions will nicely mirror the |expected| and |error| functions available to user code in actions. Implements part of pegjs#198.
The exception-creating code will get somewhat hairy soon, so let's make sure them mess will be contained. Implements part of pegjs#198.
It will be possible to create errors with user-supplied messages soon. The |SyntaxError| class needs to be ready for that. Implements part of pegjs#198.
The |error| function allows users to report custom match failures inside actions. If the |error| function is called, and the reported match failure turns out to be the cause of a parse error, the error message reported by the parser will be exactly the one specified in the |error| call. Implements part of pegjs#198. Speed impact ------------ Before: 999.83 kB/s After: 1000.84 kB/s Difference: 0.10% Size impact ----------- Before: 1017212 b After: 1019968 b Difference: 0.27% (Measured by /tools/impact with Node.js v0.6.18 on x86_64 GNU/Linux.)
Implements part of pegjs#198.
Discussion on pegjs/pegjs#198 indicated that PegJS works best if descriptions are only applied to basic symbols and not complex rules. As expected, this leads to much more useful error messages.
So, there is no "warning" function yet? |
The warning function topic is tracked at #325 |
I see three problems with the current error reporting system in PEG.js:
Actions report errors by returning
null
Reporting using
null
is inflexible (it doesn't allow to carry any information about the kind of error) and makes it impossible to returnnull
as a regular value from actions, which would be useful sometimes (for example in a JSON parser).See also null indicates failure - workaround for a grammar that should produce nulls? #17.
There is no way to produce errors with completely custom messages
For example, imagine a HTTP 1.1 parser that wants to report an error about missing
Host
header in a HTTP request with a message like "A HTTP 1.1 request must contain a Host header". Currently the only way to do this is to throw an exception with desired message manually. This is cumbersome and it does not interact well with the backtracking behavior (throwing an exception halts the parsing completely, even when it is possible to backtrack and try another alternatives).The
expected
property ofSyntaxError
is hard to process mechanicallyThe
expected
property ofSyntaxError
is either an array of expectations (each describing one alternative which the parser expected at the position of error) ornull
(meaning that the end of input was expected).Each expectation is represented by a string. This string can mean an expected literal (represented using its PEG.js syntax — a quoted string), character class (represented using its PEG.js syntax —
[...]
), or any character (represented by string"any"
). To differentiate between these cases, one has to parse the string representations manually, which makes building tools based on PEG.js such as autocompleters unnecessarily hard.I plan to redesign the error reporting system to fix these problems. Before I describe the changes I am thinking about, a little description how the current system works is needed.
How Error Reporting Currently Works?
When a PEG.js parser tries to match a string literal, character class, or
.
, and fails, it produces a match failure. It consists of a position and an expectation (description of what the parser tried to match).After producing a match failure, the parser backtracks and possibly tries another alternatives. If none of them succeeds and there is nothing more to try, the parser throws the
SyntaxError
exception. When setting its properties (such as position, expectations, error message, etc.), the parser considers only the furthest match failure (the one with the greatest position). If there are more such failures, their expectations are combined.The situation is little bit more complicated if named rules are used, but this is irrelevant for this discussion.
Proposed Changes
I am thinking about the following changes in the error reporting system:
Expectations in
SyntaxError.expected
won't be represented by strings, but by objects. The objects will have atype
property (with values like"literal"
,"class"
,"any"
,"eof"
) which will determine expectation type. For some expectation types, other properties will conain the details (e.g. for"literal"
expectation where would be avalue
property containing the expected string). This will simplify mechanical processing of the expectations.An alternative to the
type
property is to use classses. But I think thetype
property will be easier to handle for users.Actions will be allowed to return
null
. It will be a regular value and won't signify an error.Actions will be able to trigger a match failure using a new
error
function. It will take an error message as its parameter. Failures triggered by this function (called custom match failures) will consist of a position and an error message. They won't have any expectation.The
error
function won't interrupt the action execution, it will just mark it as failing and save the error message. The actual failure will be produced only after the action execution finishes. If theerror
function is invoked multiple times, the last invocation will win (its error message will be used).Custom match failures will be handled like regular match failures, meaning they won't halt the parsing completely and let the parser backtrack and possibly try another alternatives. There is one difference though — when finally throwing the
SyntaxError
exception, the expectation combination rule which applies to regular match failures won't apply to the custom ones. If in the set of match failures with the furthest position there is at least one custom one, it will simply override the regular ones completely. If there are more custom failures, the one that was produced last will win.A
SyntaxError
exception based on a custom match failure will differ from exceptions based on a regular failure. It'smessage
property will be equal to the failure's error message and itsexpected
property will benull
.Example
On input
2
, the parser generated from the grammar above will produce aSyntaxError
withmessage
set to"The number must be an odd integer."
andexpected
set tonull
Actions will also be able to to trigger a regular match failure using the
expected
function. It will take an expected value description as a parameter. This function will be provided mainly as a convenience for situations where one doesn't need to generate a full error message and automatically generated form "Expected X but "2" found." is enough.A
SyntaxError
exception based on a match failure generated by theexpected
function will be similar to exceptions based on a regular failure.Example
On input
2
, the parser generated from the grammar above will produce aSyntaxError
withmessage
set to"Expected odd integer but "2" found."
andexpected
set to[ { type: "user", description: "odd integer" } ]
Next Steps
I welcome any notes to the proposed changes — please add them as comments. I plan to start implementing the proposal (or some modified version based on feedback) soon.
The text was updated successfully, but these errors were encountered: