New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cspell performances with large (7k) words list in cspell configuration #2360
Comments
Checking 1500 files should not be an issue. It is most likely 1 or 2 files causing it to slow down. To give the spell checker a list of files to check, use: Scan files in current directory:
I suggest using a custom dictionary to avoid having a large |
I took a look at your PR. Something to try:
jq -r ".words | .[]" .cspell.json > .cspell-words.txt Change your {
"version": "0.2",
"ignorePaths": [
"**/node_modules/**",
"**/vscode-extension/**",
"**/.git/**",
".vscode",
"megalinter",
"package-lock.json",
"report"
],
"language": "en",
"dictionaryDefinitions": [
{
"name": "custom-dictionary",
"path": "./.cspell-words.txt",
"addWords": true
}
],
"dictionaries": [
"custom-dictionary"
],
"words": []
} |
I was looking at MegaLinter to see how it called cspell. I even created an issue: oxsecurity/megalinter#1220 . Then I realized you were the maintainer. |
I compared the difference between the two configurations:
There is a clear speed improvement. I would have to look into the reason but it could be related to:
|
I was able to speed it up a bit by caching some of the internal word lists. It is still 2x slower than using a custom dictionary. You can try it out: I'll release |
@Jason3S that's great, thanks :) |
5.18.0 has been published.
It is possible, but not necessarily desirable. Every word in a document is checked against all the dictionaries. The size of a dictionary doesn't matter, the look up cost is based upon the length of the word. The look up is cached, so looking up the same word again is cheaper. The configuration acts like a tree. Each configuration is merged including any The idea here is to keep the number of dictionaries low enough for performance. |
I'm going to close this for now, since it is now 4-5x faster than 5.17 |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Info
Kind of Issue
Which Tool or library
Which Version
Version: 5.16.0
Issue with supporting library?
OS:
version:
python:3.9.7-alpine3.13
Bug Description
Describe the bug
I have a repo where there are 1500+ files to check for spelling mistakes
Before I added a lot of words in .cspell.json, performances were almost acceptable
But now I added a big .cspell.json, it takes 250 seconds
Notes:
cspell file1 file2 file3 ...
ENV NODE_OPTIONS="--max-old-space-size=8192"
Is it an expected performances or are there ways to improve them ?
Thanks for your tool and your answer :)
The text was updated successfully, but these errors were encountered: