Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown containing tabs not converted properly to spaces #1559

Closed
ankurnarkhede opened this issue Oct 6, 2019 · 9 comments · Fixed by #2434
Closed

Markdown containing tabs not converted properly to spaces #1559

ankurnarkhede opened this issue Oct 6, 2019 · 9 comments · Fixed by #2434
Labels
L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue released

Comments

@ankurnarkhede
Copy link

Describe the bug
The package converts the tabs in the markdown to 4 spaces every time. It breaks the alignment of the text. It's not how tabs work. The tab indentation of the current tab in a string will be
tabsize - ((number of characters occupied in current tab buffer) % tabsize)

image

To Reproduce
Steps to reproduce the behaviour:

  1. Type any text using tabs.
  2. Paste it in Marked Demo for viewing parsed HTML. You will see changes in the alignment.

Below is the text that contains tabs that I am parsing to HTML.

# markdown alignment
    [FirstName]				=> TEST
    [LastName]				=> TESTIN
    [EmailAddress]			=> TEST
    [CompanyName]			=> Test Company
    [Phone]					=> TEST

Demo links:

  1. Marked Demo
  2. CommonMark Demo

The CommonMark was also not able to parse the tabs to appropriate number od spaces.

The ParseDown PHP package was able to parse the tabs in text to appropriate number of spaces. Its how any user will expect the HTML output to look like. You can paste the above given markdown text in ParseDown Demo
image

Expected behavior
When I am writing a markdown containing tabs, I will expect it to give the same alignment when converted to HTML. It how any user will expect.

I am using this text:
image

The arrows aren't aligned upon parsing. It gets parsed to HTML like this:
image
image

While, any user will expect it to be like this:
image

@UziTech
Copy link
Member

UziTech commented Oct 6, 2019

Tabs are probably not the best for this type of alignment. For instance your example isn't aligned in my browser because I have tabs set to 2 spaces.

image

@ankurnarkhede
Copy link
Author

@UziTech you are right!
But, that's the reason marked is converting tabs to spaces so as the output should be consistent in all the browsers.
Can I get your opinion upon this?

@UziTech
Copy link
Member

UziTech commented Oct 7, 2019

Marked should be more compliant with tabs. I think we should only replace the tabs at the start of a line.

There is still one CommonMark test that is failing for tabs.

"markdown": "-\t\tfoo\n",
"html": "<ul>\n<li>\n<pre><code> foo\n</code></pre>\n</li>\n</ul>\n",

@ankurnarkhede
Copy link
Author

@UziTech sounds better!
I will be working upon this.

@UziTech
Copy link
Member

UziTech commented Oct 12, 2019

@smartankur4u I am going to answer your questions here:

  1. You said here, that the text was not alligned properly on your browser as you had set tab to 2 spaces. This text was actually indented properly with tabs of size 4. Its obvious that its alignemnt will break with other tab sizes. So isn't this the point why marked converts tabs to spaces to have common view in each device?

Tab characters don't have an inherit size. The size can be set by the user. You choose to have them be size 4, I choose to have them be size 2, I know a developer who prefers them to be size 8.

My point is that just because the input looks aligned with tabs to the user (who can define any size for a tab character) when converting them to spaces we have to pick an arbitrary size (in CommonMark spec that size is 4) so using tabs with that type of alignment never works for everyone. Which is why IMHO tabs should only be used for indentation not alignment.

  1. Do we need to comply with this rule of CommonMark which breaks the alignment of the text? As users are expecting a the same output in HTML what they wrote in markdown, don't we need to convert tabs to appropriate spaces?

According to the common mark spec only tabs at the beginning of a line (used for indentation) are converted to 4 spaces. All other tabs are left as tab characters.

This does need to be fixed in marked to be spec compliant.

  1. Can you please comment on the ParseDown PHP package which converts everything properly as user expects. I have added screenshots and links here: Markdown containing tabs not converted properly to spaces #1559

ParseDown PHP doesn't convert everything as every user expects. If the user doesn't have tab size set to 4 the alignment is off

image

  1. Do you expect the developers facing this issue to write a custom code for converting tabs to spaces before parsing via marked?

Yes, I expect developers to understand their users better than me. If their users will always have tab size set to 4 than they can convert tabs using your tab-to-space package before sending it to marked

const html = marked(tabToSpace(markdown));

@ankurnarkhede
Copy link
Author

Thanks for the reply @UziTech.
Got your view. I will be working on the issue to convert leading tabs to spaces.
Thanks.

@UziTech UziTech added the L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue label Dec 4, 2019
@jvalrog
Copy link

jvalrog commented Jan 26, 2020

Hello, my problem is that Marked is actually changing tabs into spaces.

I need tabs to remain as tabs, that's why I put them in there. I can't copy/paste code snippets in markdown because the tabs are gone.

Is there a way to force Marked to stop doing that?

Thanks

@UziTech
Copy link
Member

UziTech commented Jan 26, 2020

@jvalrog not currently. But we are always accepting PRs

rossipedia added a commit to docker/marked that referenced this issue Apr 8, 2022
Only replaces tabs at the beginning of a block construct. Tabs in the
middle of the item are unaffected.

All tests passing. Tabs in both GFM and CommonMark at 100%

fixes markedjs#1559
UziTech pushed a commit that referenced this issue Apr 11, 2022
* fix: non leading-tabs in markdown content (#1559)

Only replaces tabs at the beginning of a block construct. Tabs in the
middle of the item are unaffected.

All tests passing. Tabs in both GFM and CommonMark at 100%

fixes #1559

* update new/html_comments.html to preserve tab

* combine redundant if condition

* add test for tab immediately after blockquote character
github-actions bot pushed a commit that referenced this issue Apr 11, 2022
## [4.0.14](v4.0.13...v4.0.14) (2022-04-11)

### Bug Fixes

* only convert leading tabs to spaces ([#1559](#1559)) ([#2434](#2434)) ([7d19665](7d19665))
@github-actions
Copy link

🎉 This issue has been resolved in version 4.0.14 🎉

The release is available on:

Your semantic-release bot 📦🚀

qtprojectorg pushed a commit to qtqa/gerrit that referenced this issue Apr 28, 2023
Markedjs by default changes leading tabs to spaces. More details:
markedjs/marked#1559

As workaround we extract user suggestion from content of
gr-formatted-text which has tabs and not spaces.

Release-Notes: skip
Google-Bug-Id: b/279925682
Change-Id: I0a6d0223b384838b29753c41437b9174dda0da46
qtprojectorg pushed a commit to qtqa/gerrit that referenced this issue Apr 28, 2023
Markedjs by default changes leading tabs to spaces. More details:
markedjs/marked#1559

As workaround we extract user suggestion from content of
gr-formatted-text which has tabs and not spaces.

Release-Notes: skip
Google-Bug-Id: b/279925682
Change-Id: I0a6d0223b384838b29753c41437b9174dda0da46
(cherry picked from commit a37ffe3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue released
Projects
None yet
3 participants