Cspell performances with large (7k) words list in cspell configuration #2360

nvuillam · 2022-01-30T00:47:37Z

Jason3S · 2022-01-30T07:53:27Z

Checking 1500 files should not be an issue.

It is most likely 1 or 2 files causing it to slow down.

To give the spell checker a list of files to check, use: --file-list. It takes a path to a file or can read from stdin.

Scan files in current directory:

ls -1 | cspell --file-list stdin

I suggest using a custom dictionary to avoid having a large .cspell.json: Custom Dictionaries - CSpell

Jason3S · 2022-01-30T08:24:42Z

I took a look at your PR.

Something to try:

Move the words into a dictionary file:

 jq -r ".words | .[]" .cspell.json > .cspell-words.txt

Change your .cspell.json to be:

{
  "version": "0.2",
  "ignorePaths": [
    "**/node_modules/**",
    "**/vscode-extension/**",
    "**/.git/**",
    ".vscode",
    "megalinter",
    "package-lock.json",
    "report"
  ],
  "language": "en",
  "dictionaryDefinitions": [
    {
      "name": "custom-dictionary",
      "path": "./.cspell-words.txt",
      "addWords": true
    }
  ],
  "dictionaries": [
    "custom-dictionary"
  ],
  "words": []
}

Jason3S · 2022-01-30T08:44:43Z

@nvuillam,

I was looking at MegaLinter to see how it called cspell. I even created an issue: oxsecurity/megalinter#1220 . Then I realized you were the maintainer.

Jason3S · 2022-01-30T09:17:47Z

@nvuillam,

I compared the difference between the two configurations:

7k words in .cspell.json

time cspell  "**"
CSpell: Files checked: 2493, Issues found: 2565 in 129 files
cspell "**"  319.47s user 9.57s system 127% cpu 4:18.59 total

7k words in custom dictionary.

time cspell  "**"
CSpell: Files checked: 2493, Issues found: 2565 in 129 files
cspell "**"  50.56s user 2.70s system 116% cpu 45.862 total.

There is a clear speed improvement. I would have to look into the reason but it could be related to:

Building the internal dictionary from "words" for each file checked. (this is very likely).
Scans for the config for each file, this also happens, but can be turned off with "noConfigSearch": true

Jason3S · 2022-01-30T21:21:19Z

@nvuillam,

I was able to speed it up a bit by caching some of the internal word lists. It is still 2x slower than using a custom dictionary.

You can try it out: npx cspell@next.

I'll release 5.18.0 tomorrow.

nvuillam · 2022-01-30T21:25:40Z

@Jason3S that's great, thanks :)
What technically prevents to put the words in some "virtual dictionary" that would be created at cspell startup ?
That would avoid more complex configuration, with same capabilities ^^

Jason3S · 2022-01-31T08:06:43Z

5.18.0 has been published.

What technically prevents to put the words in some "virtual dictionary" that would be created at cspell startup ? That would avoid more complex configuration, with same capabilities ^^

It is possible, but not necessarily desirable.

Every word in a document is checked against all the dictionaries. The size of a dictionary doesn't matter, the look up cost is based upon the length of the word. The look up is cached, so looking up the same word again is cheaper.

The configuration acts like a tree. Each configuration is merged including any cspell directives found in the document to create the final configuration that is used to check the document. words, ignoreWords, and flagWords are also merged to create a temporary dictionaries.

The idea here is to keep the number of dictionaries low enough for performance.

Jason3S · 2022-01-31T08:07:15Z

I'm going to close this for now, since it is now 4-5x faster than 5.17

github-actions · 2022-03-03T05:36:56Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

nvuillam added bug Needs Investigating new issue labels Jan 30, 2022

Jason3S removed the new issue label Jan 30, 2022

Jason3S changed the title ~~Cspell performances on large number of files~~ Cspell performances with large (7k) words list in cspell configuration Jan 30, 2022

Jason3S mentioned this issue Jan 30, 2022

fix: Speed up spell checking with large config files. #2362

Merged

Jason3S closed this as completed Jan 31, 2022

github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cspell performances with large (7k) words list in cspell configuration #2360

Cspell performances with large (7k) words list in cspell configuration #2360

nvuillam commented Jan 30, 2022

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022 •

edited

nvuillam commented Jan 30, 2022

Jason3S commented Jan 31, 2022

Jason3S commented Jan 31, 2022

github-actions bot commented Mar 3, 2022

Cspell performances with large (7k) words list in cspell configuration #2360

Cspell performances with large (7k) words list in cspell configuration #2360

Comments

nvuillam commented Jan 30, 2022

Info

Bug Description

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022

Jason3S commented Jan 30, 2022 • edited

nvuillam commented Jan 30, 2022

Jason3S commented Jan 31, 2022

Jason3S commented Jan 31, 2022

github-actions bot commented Mar 3, 2022

Jason3S commented Jan 30, 2022 •

edited