feat: add preprocess and postprocess hooks #2730

UziTech · 2023-02-11T17:09:43Z

Marked version: 4.2.12

Description

This is the start of implementing #1878

Add preprocess and postprocess hooks so extensions can process the markdown and options before sending it to marked and they can process the html after marked is done with it.

A couple use cases come to mind. We can have a sanitize extension that works on the final output html. We can have an extension that parses front-matter for the options marked should use during the parsing.

Docs examples:

Example: Set options based on front-matter

import { marked } from 'marked';
import fm from 'front-matter';

// Override function
const hooks = {
  preprocess(markdown) {
    const { attributes, body } = fm(markdown);
    for (const prop in attributes) {
      if (prop in this.options) {
        this.options[prop] = attributes[prop];
      }
    }
    return body;
  }
};

marked.use({ hooks });

// Run marked
console.log(marked.parse(`
---
headerIds: false
---

## test
`.trim()));

Output:

<h2>test</h2>

Example: Sanitize HTML with isomorphic-dompurify

import { marked } from 'marked';
import DOMPurify from 'isomorphic-dompurify';

// Override function
const hooks = {
  postprocess(html) {
    return DOMPurify.sanitize(html);
  }
};

marked.use({ hooks });

// Run marked
console.log(marked.parse(`
<img src=x onerror=alert(1)//>
`));

Output:

<img src="x">

Contributor

Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
no tests required for this PR.
If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

CI is green (no forced merge required).
Squash and Merge PR following conventional commit guidelines.

vercel · 2023-02-11T17:09:47Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
marked-website	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Mar 22, 2023 at 5:44AM (UTC)

UziTech · 2023-02-11T18:30:03Z

src/Hooks.js

+  static passThroughHooks = new Set([
+    'preprocess',
+    'postprocess'
+  ]);


Currently both preprocess and postprocess are "passthrough" hooks, which means the return value of one function gets passed to the next hook.

In the future I could see us adding some hooks that would not pass the return to the next hook. For example the way we call renderer or tokenizer methods is not pass through. The first tokenizer is called and if it returns something other than false we don't call the next tokenizer of the same name at all.

UziTech · 2023-02-11T18:31:54Z

src/marked.js

+      const hooks = marked.defaults.hooks || new Hooks();
+      for (const prop in pack.hooks) {
+        const prevHook = hooks[prop];
+        if (Hooks.passThroughHooks.has(prop)) {


Here is where we determine what to do for the next hook based on whether the hook is a "passthrough" hook. Either we pass the return value to the next function or we stop on false like tokenizers and renderers.

UziTech · 2023-02-11T18:35:33Z

src/marked.js

-  opt = merge({}, marked.defaults, opt || {});
+  const origOpt = { ...opt };
+  opt = { ...marked.defaults, ...origOpt };


I removed the merge helper in favor of the spread operator. This actually sped up marked's benchmarks by about 300ms (~5%).

UziTech · 2023-02-11T18:39:00Z

docs/USING_PRO.md

@@ -255,7 +255,7 @@ smartypants('"this ... string"')

 <h2 id="walk-tokens">Walk Tokens : <code>walkTokens</code></h2>

-The walkTokens function gets called with every token. Child tokens are called before moving on to sibling tokens. Each token is passed by reference so updates are persisted when passed to the parser. The return value of the function is ignored.
+The walkTokens function gets called with every token. Child tokens are called before moving on to sibling tokens. Each token is passed by reference so updates are persisted when passed to the parser. When [`async`](#async) mode is enabled, the return value is awaited. Otherwise the return value is ignored. 


I also merged the changes from #2728 in this PR because I am touching all of the same code.

fixes #2721

UziTech · 2023-02-11T18:46:21Z

It might not be a bad idea to create multiple PRs for this change:

Fix always return a promise if async fix: always return promise if async #2728
Change merge to spread operator
add preprocess and postprocess hooks

I added them all here to get the complete picture and make sure my idea actually worked.

src/marked.js

UziTech · 2023-02-25T02:54:44Z

@calculuschild @styfle thoughts?

Are these type of extension hooks something we want to maintain?

Pros

Enables more types of extensions that need to do something to the markdown/options before parsing or the output HTML.

Cons

More code to maintain. I could see extension creators wanting even more hooks (i.e. between lexing and parsing)

The alternative is for these types of extensions just wrapping marked and including it as a dependency. But there is no easy way to compose multiple extensions that way.

styfle · 2023-02-26T13:38:21Z

I read the PR description but I don’t understand the purpose of this feature.

Since the input of preprocess/postprocess is just the markdown/html, it seems like it would be equivalent to the current usage:

const md = preprocess(markdown)
const html = marked(md)
const output = postprocess(html)

Is there a use case that hooks cover that isn’t covered by this snippet?

UziTech · 2023-02-26T15:12:42Z

No, that is exactly what this does.

The benefit of this PR is that extensions can do that so they can be composable.

The examples that I give are frontmatter and dompurify. If someone creates an npm package that wraps marked to add front matter parsing and another package that wraps marked to do sanitization with dompurify a user wouldn't be able to use them together.

With this PR the user would be able to include both or just one with marked.use.

calculuschild · 2023-03-02T15:44:20Z

My initial impression is this is moving a bit outside the scope of being a Markdown parser, and my inclination is that we should keep Marked.js that way. If I were a developer who wanted to have some kind of frontmatter parsing and a post-sanitizer, I would probably opt to just do it the way @styfle shows, passing the result from process to process rather than embedding everything inside of Marked.js.

If someone creates an npm package that wraps marked

I may be misunderstanding, but why would a package designer want to wrap marked.js like this anyway? Why not just use the base DOMPurify package as an external postprocessing step as we have always done? If a user wants to have both a pre- and post-processing step it seems more practical to just use the packages directly rather than having to build a specialized Marked.js-extension plugin that does the same thing?

A lot of time has passed since #1878.... Now that custom extensions exist I wonder if its worth re-evaluating whether/which hooks are still necessary?

UziTech · 2023-03-03T23:26:26Z

The dompurify and front-matter example are fairly trivial examples, but if someone wants to create an extension like Table of Contents or footnotes that have multiple parts (a tokenizer/renderer and something that modifies the output html) then anyone who wants to use that extension would need to implement multiple parts instead of just being able to write marked.use(extension).

A lot of time has passed since #1878.... Now that custom extensions exist I wonder if its worth re-evaluating whether/which hooks are still necessary?

Ya I don't think all of those hooks are necessary which is why I started a new PR instead of building on that one. That was more of a POC to define which hooks would be needed to get rid of the current options so we could just have extensions. I would still like to deprecate a lot of the current options and find ways to move them to extensions.

I agree this is somewhat out of scope for marked.

@huaichaow do you have any input on this? I know you are trying to write an extension that would take advantage of these hooks. Is there any use cases we should consider that can't be worked around without these hooks?

huaichaow · 2023-03-07T03:41:56Z

Hi @UziTech @styfle @calculuschild

the idea is to have a place to clean up the side effects created in extensions, e.g., the Table-of-Contents extension I've been working on https://github.com/huaichaow/marked-toc-extension/blob/main/src/index.ts.

all the headings found are saved in an array in the walkTokens callback and consumed in the renderer function of the extension. it's ok if marked is fired only once, but in most cases, e.g., in a live editor, it will be executed many times leading to duplicated headings pushed into the array over and over again.

why not pass a new marked?

currently, it's not possible as the settings are cumulative, and there's no method to reset the marked.defaults, please check:

marked.options =
marked.setOptions = function(opt) {
  merge(marked.defaults, opt);
  changeDefaults(marked.defaults);
  return marked;
};

and most importantly, lifecycle hooks will make it more flexible and easier to create Extensions - more stages are available for use in extensions

UziTech · 2023-03-10T22:39:40Z

@styfle @calculuschild what is the final decision? Are we saying these type of hooks are out of scope and should be handled separate from the custom extension?

For example in @huaichaow's example he could export a function that users of his extension should call after each marked run instead of being able to handle cleanup in the extension, like:

import { marked } from 'marked';
import { tocExtension, tocCleanup } from 'marked-toc-extension';

marked.use(tocExtension);

marked.parse(someMarkdown);
tocCleanup();

marked.parse(someOtherMarkdown);
tocCleanup();

I would say I am still on the fence. On the one hand this is slightly out of scope. On the other hand this does seem to complement the custom extensions really well. I can think of many extensions that would need these hooks to use marked.use, but they don't really need to be extensions (like the front-matter and dompurify examples above).

styfle

I'm +0 on this one.

It seems it might be useful for some of those cases mentioned above.

vercel bot deployed to Preview February 11, 2023 17:10 View deployment

UziTech mentioned this pull request Feb 11, 2023

move most options to Hooks #1878

Closed

4 tasks

vercel bot deployed to Preview February 11, 2023 17:12 View deployment

vercel bot deployed to Preview February 11, 2023 17:13 View deployment

vercel bot deployed to Preview February 11, 2023 17:14 View deployment

UziTech commented Feb 11, 2023

View reviewed changes

UziTech added the NFU - new feature (user requested) label Feb 11, 2023

UziTech marked this pull request as ready for review February 11, 2023 18:42

UziTech requested review from calculuschild, styfle, davisjam and joshbruce February 11, 2023 18:43

vercel bot deployed to Preview February 11, 2023 18:50 View deployment

UziTech commented Feb 12, 2023

View reviewed changes

src/marked.js Outdated Show resolved Hide resolved

vercel bot deployed to Preview February 12, 2023 05:37 View deployment

vercel bot deployed to Preview February 12, 2023 05:40 View deployment

vercel bot deployed to Preview February 12, 2023 05:43 View deployment

styfle approved these changes Mar 17, 2023

View reviewed changes

rm-neilsondesa mentioned this pull request Sep 18, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client rm-neilsondesa/DSOF-Patch-Chat-App#29

Open

NOUIY mentioned this pull request Sep 18, 2024

[Snyk] Upgrade: async, dompurify, etch, underscore-plus, fs-plus, glob, marked, request, semver NOUIY/settings-view#57

Open

Nagraj-Shankar mentioned this pull request Sep 19, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client Nagraj-Shankar/snyk-chat-goof#28

Open

rm-neilsondesa mentioned this pull request Sep 19, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client rm-neilsondesa/DSOF-Patch-Chat-App#30

Open

Nagraj-Shankar mentioned this pull request Sep 20, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client Nagraj-Shankar/snyk-chat-goof#29

Open

rm-neilsondesa mentioned this pull request Sep 20, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client rm-neilsondesa/DSOF-Patch-Chat-App#31

Open

SebAllouche mentioned this pull request Sep 21, 2024

[Snyk] Upgrade: adm-zip, ms, cfenv, consolidate, dustjs-helpers, errorhandler, express-session, hbs, jquery, marked, mongoose, stream-buffers, typeorm, validator SebAllouche/nodejs-goof#4

Open

Nagraj-Shankar mentioned this pull request Sep 21, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client Nagraj-Shankar/snyk-chat-goof#30

Open

rm-neilsondesa mentioned this pull request Sep 22, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client rm-neilsondesa/DSOF-Patch-Chat-App#32

Open

Nagraj-Shankar mentioned this pull request Sep 23, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client Nagraj-Shankar/snyk-chat-goof#31

Open

rm-neilsondesa mentioned this pull request Sep 23, 2024

[Snyk] Upgrade: marked, nuxt, sockjs-client rm-neilsondesa/DSOF-Patch-Chat-App#33

Open

mkajitansnyk mentioned this pull request Sep 23, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#2964

Open

SebAllouche mentioned this pull request Sep 24, 2024

[Snyk] Upgrade marked from 4.0.10 to 4.3.0 SebAllouche/nodejs-goof#5

Open

This was referenced Sep 24, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#2970

Open

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#2978

Open

mkajitansnyk mentioned this pull request Oct 1, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#2984

Open

stefan-ionita3113 mentioned this pull request Oct 3, 2024

[Snyk] Upgrade marked from 4.0.10 to 4.3.0 stefan-ionita3113/nodejs-goof#7

Open

mkajitansnyk mentioned this pull request Oct 3, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#2994

Open

thomas-ouellette mentioned this pull request Oct 4, 2024

[Snyk] Upgrade marked from 4.0.10 to 4.3.0 thomas-ouellette/nodejs-goof#6

Open

This was referenced Oct 4, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#2999

Open

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#3004

Open

mkajitansnyk mentioned this pull request Oct 24, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#3050

Open

This was referenced Oct 27, 2024

[Snyk] Upgrade marked from 4.0.10 to 4.3.0 thiagarajanb/NodeGoat#5

Open

[Snyk] Upgrade marked from 4.0.10 to 4.3.0 thiagarajanb/nodejs-goof#6

Open

mkajitansnyk mentioned this pull request Nov 1, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#3058

Open

Sangram99-hacker mentioned this pull request Nov 2, 2024

[Snyk] Upgrade marked from 4.0.10 to 4.3.0 Sangram99-hacker/nodejs-goof#6

Open

mkajitansnyk mentioned this pull request Nov 2, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#3063

Open

joh748 mentioned this pull request Nov 4, 2024

[Snyk] Upgrade marked from 4.0.10 to 4.3.0 joh748/snyk-goof#7

Open

This was referenced Nov 7, 2024

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#3069

Open

[Snyk] Upgrade marked from 4.0.12 to 4.3.0 mkajitansnyk/gravitee-gateway#3075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add preprocess and postprocess hooks #2730

feat: add preprocess and postprocess hooks #2730

UziTech commented Feb 11, 2023 •

edited

Loading

vercel bot commented Feb 11, 2023 •

edited

Loading

UziTech Feb 11, 2023

UziTech Feb 11, 2023

UziTech Feb 11, 2023

UziTech Feb 11, 2023

UziTech commented Feb 11, 2023 •

edited

Loading

UziTech commented Feb 25, 2023 •

edited

Loading

styfle commented Feb 26, 2023

UziTech commented Feb 26, 2023 •

edited

Loading

calculuschild commented Mar 2, 2023 •

edited

Loading

UziTech commented Mar 3, 2023 •

edited

Loading

huaichaow commented Mar 7, 2023

UziTech commented Mar 10, 2023

styfle left a comment

feat: add preprocess and postprocess hooks #2730

feat: add preprocess and postprocess hooks #2730

Conversation

UziTech commented Feb 11, 2023 • edited Loading

Description

Docs examples:

Contributor

Committer

vercel bot commented Feb 11, 2023 • edited Loading

UziTech Feb 11, 2023

Choose a reason for hiding this comment

UziTech Feb 11, 2023

Choose a reason for hiding this comment

UziTech Feb 11, 2023

Choose a reason for hiding this comment

UziTech Feb 11, 2023

Choose a reason for hiding this comment

UziTech commented Feb 11, 2023 • edited Loading

UziTech commented Feb 25, 2023 • edited Loading

Pros

Cons

styfle commented Feb 26, 2023

UziTech commented Feb 26, 2023 • edited Loading

calculuschild commented Mar 2, 2023 • edited Loading

UziTech commented Mar 3, 2023 • edited Loading

huaichaow commented Mar 7, 2023

UziTech commented Mar 10, 2023

styfle left a comment

Choose a reason for hiding this comment

UziTech commented Feb 11, 2023 •

edited

Loading

vercel bot commented Feb 11, 2023 •

edited

Loading

UziTech commented Feb 11, 2023 •

edited

Loading

UziTech commented Feb 25, 2023 •

edited

Loading

UziTech commented Feb 26, 2023 •

edited

Loading

calculuschild commented Mar 2, 2023 •

edited

Loading

UziTech commented Mar 3, 2023 •

edited

Loading