Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: JustinBeckwith/linkinator
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: c915959529247b1177baf975a04a3bc0ef8cca72
Choose a base ref
...
head repository: JustinBeckwith/linkinator
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 7b9d4936b09a56a8e130de2a70ca17dea3feb41b
Choose a head ref

Commits on Nov 29, 2020

  1. feat: add basic support for markdown (#188)

    This adds a `--markdown` flag that allows for basic markdown link scanning.  When passing `--markdown` on the CLI or setting the `markdown: true` option, markdown in the local directory will be rendered as HTML and scanned.
    JustinBeckwith authored Nov 29, 2020
    Copy the full SHA
    524f600 View commit details
  2. Copy the full SHA
    f4822f0 View commit details
  3. Copy the full SHA
    bae9d38 View commit details
  4. Copy the full SHA
    ce649d4 View commit details

Commits on Dec 1, 2020

  1. Copy the full SHA
    e70dff6 View commit details
  2. Copy the full SHA
    8913f87 View commit details

Commits on Dec 2, 2020

  1. 4
    Copy the full SHA
    429b325 View commit details

Commits on Dec 3, 2020

  1. Copy the full SHA
    b47f4b6 View commit details
  2. Copy the full SHA
    c40be4b View commit details
  3. chore(deps): update dependency execa to v5 (#201)

    Co-authored-by: Renovate Bot <bot@renovateapp.com>
    renovate[bot] and renovate-bot authored Dec 3, 2020
    Copy the full SHA
    c7fa9ad View commit details

Commits on Dec 6, 2020

  1. Copy the full SHA
    8d8472a View commit details

Commits on Dec 7, 2020

  1. Copy the full SHA
    0c8cd4b View commit details

Commits on Dec 21, 2020

  1. Copy the full SHA
    217074e View commit details

Commits on Dec 22, 2020

  1. Copy the full SHA
    7c84936 View commit details

Commits on Dec 24, 2020

  1. feat: add verbosity flag to CLI (#214)

    This adds a --verbosity flag, which defaults to WARNING. Skipped links now are hidden by default, unless verbosity is set to INFO or DEBUG.
    JustinBeckwith authored Dec 24, 2020
    Copy the full SHA
    d20cff5 View commit details
  2. Copy the full SHA
    cf29469 View commit details
  3. Copy the full SHA
    9eb5590 View commit details
  4. Copy the full SHA
    a8c0a43 View commit details

Commits on Dec 26, 2020

  1. Copy the full SHA
    fee112b View commit details
  2. Copy the full SHA
    679d64f View commit details

Commits on Dec 28, 2020

  1. Copy the full SHA
    c752724 View commit details
  2. Copy the full SHA
    6e49545 View commit details

Commits on Dec 29, 2020

  1. Copy the full SHA
    6f8d65a View commit details
  2. feat: support directory listings (#225)

    In addition to providing the directory listing flag, this swaps the underlying HTTP server from `serve-static` to `serve-handler`.  There should be no user facing changes for that swap.
    JustinBeckwith authored Dec 29, 2020
    Copy the full SHA
    39cf9d2 View commit details
  3. Copy the full SHA
    a7d8625 View commit details

Commits on Dec 30, 2020

  1. Copy the full SHA
    d7c4758 View commit details
  2. Copy the full SHA
    96a8750 View commit details
  3. Copy the full SHA
    936af89 View commit details

Commits on Jan 3, 2021

  1. feat: introduce retry-after detection (#221)

    This introduces a --retry flag, which when passed will automatically retry requests that comes back with a HTTP 429, and a retry-after header. I tested this against GitHub , and it appears to work as expected.
    JustinBeckwith authored Jan 3, 2021
    Copy the full SHA
    cebea21 View commit details

Commits on Jan 4, 2021

  1. Copy the full SHA
    1850490 View commit details

Commits on Jan 5, 2021

  1. fix: map paths in results back to filesystem (#231)

    Fixes #166.  This updates the returned paths in the results to map to the filesystem if a local path was given.
    JustinBeckwith authored Jan 5, 2021
    Copy the full SHA
    5f7bb18 View commit details

Commits on Jan 7, 2021

  1. fix(deps): update dependency meow to v9 (#232)

    Co-authored-by: Renovate Bot <bot@renovateapp.com>
    renovate[bot] and renovate-bot authored Jan 7, 2021
    Copy the full SHA
    6374c60 View commit details

Commits on Jan 10, 2021

  1. Copy the full SHA
    5d347f0 View commit details
  2. Copy the full SHA
    9ff9e9a View commit details
  3. build: disable package-lock.json properly (#236)

    The default setting is to always generate a package-lock.json file, so this step is required to prevent its generation.
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 10, 2021
    Copy the full SHA
    e6c539a View commit details
  4. docs: README.md Markdown tweaks (#238)

    * add newlines before/after code blocks and lists
    * remove dollar sign from snippets since it makes copying harder and the commands don't have any output
    * fix headings hierarchy
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 10, 2021
    Copy the full SHA
    105d783 View commit details
  5. build: add CodeQL scanning (#234)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 10, 2021
    Copy the full SHA
    b6ca492 View commit details
  6. build: update CI config (#235)

    * move Node.js version to an environment variable
    * update to `actions/setup-node@v2`
    XhmikosR authored Jan 10, 2021
    Copy the full SHA
    4f12838 View commit details

Commits on Jan 11, 2021

  1. Copy the full SHA
    71d46aa View commit details
  2. Copy the full SHA
    b78be5d View commit details
  3. build: fix release action (#240)

    When we are not on the upstream repo, don't run the release action
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 11, 2021
    Copy the full SHA
    071a220 View commit details

Commits on Jan 22, 2021

  1. Copy the full SHA
    0433251 View commit details

Commits on Jan 23, 2021

  1. Copy the full SHA
    f14c912 View commit details

Commits on Jan 24, 2021

  1. 2
    Copy the full SHA
    026a012 View commit details
  2. Copy the full SHA
    cb1d808 View commit details
  3. Copy the full SHA
    1b35af6 View commit details
  4. build: remove npm link from docs-test too (#255)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    43cb074 View commit details
  5. build: CI: add caching for Windows too (#253)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    54cfe7d View commit details
  6. chore: README.md: remove .svg from badges (#252)

    It's the default
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    a5ac753 View commit details
  7. build: CI: switch to Node.js 14 (#254)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    8c6b6cc View commit details
Showing with 9,695 additions and 277 deletions.
  1. +9 −0 .c8rc
  2. +30 −23 .github/workflows/ci.yaml
  3. +35 −0 .github/workflows/codeql.yml
  4. +1 −2 .gitignore
  5. +3 −2 .mocharc.json
  6. +4 −0 .releaserc.json
  7. +150 −40 README.md
  8. +9 −0 SECURITY.md
  9. +7,570 −0 package-lock.json
  10. +26 −24 package.json
  11. +1 −1 renovate.json
  12. +180 −57 src/cli.ts
  13. +7 −0 src/config.ts
  14. +238 −86 src/index.ts
  15. +50 −25 src/links.ts
  16. +49 −0 src/logger.ts
  17. +149 −0 src/options.ts
  18. +75 −0 src/queue.ts
  19. +123 −0 src/server.ts
  20. +2 −1 test/fixtures/config/linkinator.config.json
  21. +6 −0 test/fixtures/config/skip-array-config.json
  22. +1 −0 test/fixtures/directoryIndex/README.md
  23. +1 −0 test/fixtures/directoryIndex/dir1/dir1.md
  24. 0 test/fixtures/directoryIndex/dir2/dir2.md
  25. +5 −0 test/fixtures/local/index.html
  26. +5 −0 test/fixtures/local/page2.html
  27. +21 −0 test/fixtures/markdown/LICENSE.md
  28. +6 −0 test/fixtures/markdown/README.md
  29. BIN test/fixtures/markdown/boo.jpg
  30. +4 −0 test/fixtures/markdown/deep/deep.md
  31. +4 −0 test/fixtures/markdown/unlinked.md
  32. +5 −0 test/fixtures/nested/doll1/index.html
  33. +5 −0 test/fixtures/nested/doll2/index.html
  34. +7 −0 test/fixtures/retry/index.html
  35. +5 −0 test/fixtures/retry/subpage.html
  36. +5 −0 test/fixtures/retryCLI/index.html
  37. +21 −0 test/fixtures/rewrite/LICENSE.md
  38. +2 −0 test/fixtures/rewrite/README.md
  39. +5 −0 test/fixtures/server/5.0/index.html
  40. +5 −0 test/fixtures/server/bag/bag.html
  41. +5 −0 test/fixtures/server/index.html
  42. +1 −0 test/fixtures/server/script.js
  43. +5 −0 test/fixtures/server/test.html
  44. +4 −0 test/fixtures/srcset/_site/bar.html
  45. +4 −0 test/fixtures/srcset/_site/foo.html
  46. +5 −0 test/fixtures/srcset/index.html
  47. +11 −0 test/fixtures/twittercard/index.html
  48. +271 −0 test/test.cli.ts
  49. +272 −1 test/{test.ts → test.index.ts}
  50. +213 −0 test/test.retry.ts
  51. +79 −0 test/test.server.ts
  52. +0 −13 test/zcli.ts
  53. +1 −2 tsconfig.json
9 changes: 9 additions & 0 deletions .c8rc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"exclude": [
"build/test"
],
"reporter": [
"html",
"text"
]
}
53 changes: 30 additions & 23 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,65 +1,72 @@
on:
push:
branches:
- master
- main
pull_request:
name: ci
env:
FORCE_COLOR: 2
NODE: 14
jobs:
test:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
node: [10, 12, 14]
node: [10, 12, 14, 16]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: ${{ matrix.node }}
- run: npm install
cache: npm
- run: npm ci
- run: npm test
- run: npm run codecov
if: matrix.node == env.NODE
windows:
runs-on: windows-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: 12
- run: npm install
node-version: ${{ env.NODE }}
cache: npm
- run: npm ci
- run: npm test
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: 12
- run: npm install
node-version: ${{ env.NODE }}
cache: npm
- run: npm ci
- run: npm run lint
coverage:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 12
- run: npm install
- run: npm test
- run: npm run codecov
- uses: codecov/codecov-action@v1
- uses: actions/setup-node@v2
with:
token: ${{ secrets.CODECOV_TOKEN }}
node-version: ${{ env.NODE }}
cache: npm
- run: npm ci
- run: npm run docs-test
release:
if: github.ref == 'refs/heads/master'
if: github.repository == 'JustinBeckwith/linkinator' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
needs: [test, lint]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: 12
- run: npm install
node-version: ${{ env.NODE }}
cache: npm
- run: npm ci
- run: npm run compile
- run: npm run build-binaries
- run: npx semantic-release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
35 changes: 35 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: "CodeQL"

on:
push:
branches:
- main
- "!renovate/**"
pull_request:
# The branches below must be a subset of the branches above
branches:
- main
- "!renovate/**"
schedule:
- cron: "0 0 * * 0"

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: "javascript"

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
node_modules/
package-lock.json
.nyc_output
build/
coverage
.vscode
5 changes: 3 additions & 2 deletions .mocharc.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"check-leaks": true,
"timeout": 10000,
"timeout": 2000,
"throw-deprecation": true,
"enable-source-maps": true
"enable-source-maps": true,
"exit": true
}
4 changes: 4 additions & 0 deletions .releaserc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"assets": "build/binaries/*",
"branches": ["main"]
}
190 changes: 150 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,107 +1,148 @@
# 🐿 linkinator

> A super simple site crawler and broken link checker.
[![npm version](https://img.shields.io/npm/v/linkinator.svg)](https://www.npmjs.org/package/linkinator)
[![Build Status](https://api.cirrus-ci.com/github/JustinBeckwith/linkinator.svg)](https://cirrus-ci.com/github/JustinBeckwith/linkinator)
[![codecov](https://codecov.io/gh/JustinBeckwith/linkinator/branch/master/graph/badge.svg)](https://codecov.io/gh/JustinBeckwith/linkinator)
[![Dependency Status](https://img.shields.io/david/JustinBeckwith/linkinator.svg)](https://david-dm.org/JustinBeckwith/linkinator)
[![Known Vulnerabilities](https://snyk.io/test/github/JustinBeckwith/linkinator/badge.svg)](https://snyk.io/test/github/JustinBeckwith/linkinator)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release)
[![npm version](https://img.shields.io/npm/v/linkinator)](https://www.npmjs.org/package/linkinator)
[![Build Status](https://img.shields.io/github/workflow/status/JustinBeckwith/linkinator/ci/main)](https://github.com/JustinBeckwith/linkinator/actions?query=branch%3Amain+workflow%3Aci)
[![codecov](https://img.shields.io/codecov/c/github/JustinBeckwith/linkinator/main)](https://codecov.io/gh/JustinBeckwith/linkinator)
[![Known Vulnerabilities](https://img.shields.io/snyk/vulnerabilities/github/JustinBeckwith/linkinator)](https://snyk.io/test/github/JustinBeckwith/linkinator)
[![Code Style: Google](https://img.shields.io/badge/code%20style-google-blueviolet)](https://github.com/google/gts)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079)](https://github.com/semantic-release/semantic-release)


Behold my latest inator! The `linkinator` provides an API and CLI for crawling websites and validating links. It's got a ton of sweet features:
- 🔥Easily perform scans on remote sites or local files
- 🔥Scan any element that includes links, not just `<a href>`
- 🔥Supports redirects, absolute links, relative links, all the things
- 🔥Configure specific regex patterns to skip

- 🔥 Easily perform scans on remote sites or local files
- 🔥 Scan any element that includes links, not just `<a href>`
- 🔥 Supports redirects, absolute links, relative links, all the things
- 🔥 Configure specific regex patterns to skip
- 🔥 Scan markdown files without transpilation

## Installation

```sh
$ npm install linkinator
npm install linkinator
```

Not into the whole node.js or npm thing? You can also download a standalone binary that bundles node, linkinator, and anything else you need. See [releases](https://github.com/JustinBeckwith/linkinator/releases).

## Command Usage

You can use this as a library, or as a CLI. Let's see the CLI!

```sh
$ linkinator LOCATION [ --arguments ]
```text
$ linkinator LOCATIONS [ --arguments ]
Positional arguments
LOCATION
Required. Either the URL or the path on disk to check for broken links.
LOCATIONS
Required. Either the URLs or the paths on disk to check for broken links.
Supports multiple paths, and globs.
Flags
--concurrency
The number of connections to make simultaneously. Defaults to 100.
--config
Path to the config file to use. Looks for `linkinator.config.json` by default.
--concurrency
The number of connections to make simultaneously. Defaults to 100.
--directory-listing
Include an automatic directory index file when linking to a directory.
Defaults to 'false'.
--recurse, -r
Recursively follow links on the same root domain.
--format, -f
Return the data in CSV or JSON format.
--skip, -s
List of urls in regexy form to not include in the check.
--help
Show this command.
--include, -i
List of urls in regexy form to include. The opposite of --skip.
--format, -f
Return the data in CSV or JSON format.
--markdown
Automatically parse and scan markdown if scanning from a location on disk.
--recurse, -r
Recursively follow links on the same root domain.
--retry,
Automatically retry requests that return HTTP 429 responses and include
a 'retry-after' header. Defaults to false.
--silent
Only output broken links.
--server-root
When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in [LOCATION].
--skip, -s
List of urls in regexy form to not include in the check.
--timeout
Request timeout in ms. Defaults to 0 (no timeout).
--help
Show this command.
--url-rewrite-search
Pattern to search for in urls. Must be used with --url-rewrite-replace.
--url-rewrite-replace
Expression used to replace search content. Must be used with --url-rewrite-search.
--verbosity
Override the default verbosity for this command. Available options are
'debug', 'info', 'warning', 'error', and 'none'. Defaults to 'warning'.
```

### Command Examples

You can run a shallow scan of a website for busted links:

```sh
$ npx linkinator http://jbeckwith.com
npx linkinator http://jbeckwith.com
```

That was fun. What about local files? The linkinator will stand up a static web server for yinz:

```sh
$ npx linkinator ./docs
npx linkinator ./docs
```

But that only gets the top level of links. Lets go deeper and do a full recursive scan!

```sh
$ npx linkinator ./docs --recurse
npx linkinator ./docs --recurse
```

Aw, snap. I didn't want that to check *those* links. Let's skip em:

```sh
$ npx linkinator ./docs --skip www.googleapis.com
npx linkinator ./docs --skip www.googleapis.com
```

The `--skip` parameter will accept any regex! You can do more complex matching, or even tell it to only scan links with a given domain:

```sh
$ linkinator http://jbeckwith.com --skip '^(?!http://jbeckwith.com)'
linkinator http://jbeckwith.com --skip '^(?!http://jbeckwith.com)'
```

Maybe you're going to pipe the output to another program. Use the `--format` option to get JSON or CSV!

```sh
$ linkinator ./docs --format CSV
linkinator ./docs --format CSV
```

Let's make sure the `README.md` in our repo doesn't have any busted links:

```sh
linkinator ./README.md --markdown
```

You know what, we better check all of the markdown files!

```sh
linkinator "**/*.md" --markdown
```

### Configuration file

You can pass options directly to the `linkinator` CLI, or you can define a config file. By default, `linkinator` will look for a `linkinator.config.json` file in the current working directory.

All options are optional. It should look like this:
@@ -113,36 +154,73 @@ All options are optional. It should look like this:
"silent": true,
"concurrency": 100,
"timeout": 0,
"markdown": true,
"directoryListing": true,
"skip": "www.googleapis.com"
}
```

To load config settings outside the CWD, you can pass the `--config` flag to the `linkinator` CLI:

```sh
$ linkinator --config /some/path/your-config.json
linkinator --config /some/path/your-config.json
```

## GitHub Actions

You can use `linkinator` as a GitHub Action as well, using [JustinBeckwith/linkinator-action](https://github.com/JustinBeckwith/linkinator-action):

```yaml
on:
push:
branches:
- main
pull_request:
name: ci
jobs:
linkinator:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: JustinBeckwith/linkinator-action@v1
with:
paths: README.md
```
To see all options or to learn more, visit [JustinBeckwith/linkinator-action](https://github.com/JustinBeckwith/linkinator-action).
## API Usage
#### linkinator.check(options)
### linkinator.check(options)
Asynchronous method that runs a site wide scan. Options come in the form of an object that includes:
- `path` (string) - A fully qualified path to the url to be scanned, or the path to the directory on disk that contains files to be scanned. *required*.
- `path` (string|string[]) - A fully qualified path to the url to be scanned, or the path(s) to the directory on disk that contains files to be scanned. *required*.
- `concurrency` (number) - The number of connections to make simultaneously. Defaults to 100.
- `port` (number) - When the `path` is provided as a local path on disk, the `port` on which to start the temporary web server. Defaults to a random high range order port.
- `recurse` (boolean) - By default, all scans are shallow. Only the top level links on the requested page will be scanned. By setting `recurse` to `true`, the crawler will follow all links on the page, and continue scanning links **on the same domain** for as long as it can go. Results are cached, so no worries about loops.
- `retry` (boolean|RetryConfig) - Automatically retry requests that respond with an HTTP 429, and include a `retry-after` header. The `RetryConfig` option is a placeholder for fine-grained controls to be implemented at a later time, and is only included here to signal forward-compatibility.
- `serverRoot` (string) - When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in `path`.
- `timeout` (number) - By default, requests made by linkinator do not time out (or follow the settings of the OS). This option (in milliseconds) will fail requests after the configured amount of time.
- `markdown` (boolean) - Automatically parse and scan markdown if scanning from a location on disk.
- `linksToSkip` (array | function) - An array of regular expression strings that should be skipped, OR an async function that's called for each link with the link URL as its only argument. Return a Promise that resolves to `true` to skip the link or `false` to check it.
- `directoryListing` (boolean) - Automatically serve a static file listing page when serving a directory. Defaults to `false`.
- `urlRewriteExpressions` (array) - Collection of objects that contain a search pattern, and replacement.

### linkinator.LinkChecker()

#### linkinator.LinkChecker()
Constructor method that can be used to create a new `LinkChecker` instance. This is particularly useful if you want to receive events as the crawler crawls. Exposes the following events:

- `pagestart` (string) - Provides the url that the crawler has just started to scan.
- `link` (object) - Provides an object with
- `url` (string) - The url that was scanned
- `state` (string) - The result of the scan. Potential values include `BROKEN`, `OK`, or `SKIPPED`.
- `status` (number) - The HTTP status code of the request.

### Simple example
### Examples

#### Simple example

```js
const link = require('linkinator');
@@ -178,7 +256,7 @@ async function simple() {
simple();
```

### Complete example
#### Complete example

In most cases you're going to want to respond to events, as running the check command can kinda take a long time.

@@ -235,6 +313,38 @@ async function complex() {
complex();
```

## Tips & Tricks

### Using a proxy

This library supports proxies via the `HTTP_PROXY` and `HTTPS_PROXY` environment variables. This [guide](https://www.golinuxcloud.com/set-up-proxy-http-proxy-environment-variable/) provides a nice overview of how to format and set these variables.

### Globbing

You may have noticed in the example, when using a glob the pattern is encapsulated in quotes:

```sh
linkinator "**/*.md" --markdown
```

Without the quotes, some shells will attempt to expand the glob paths on their own. Various shells (bash, zsh) have different, somewhat unpredictable behaviors when left to their own devices. Using the quotes ensures consistent, predictable behavior by letting the library expand the pattern.

### Debugging

Oftentimes when a link fails, it's an easy to spot typo, or a clear 404. Other times ... you may need more details on exactly what went wrong. To see a full call stack for the HTTP request failure, use `--verbosity DEBUG`:

```sh
linkinator https://jbeckwith.com --verbosity DEBUG
```

### Controlling Output

The `--verbosity` flag offers preset options for controlling the output, but you may want more control. Using [`jq`](https://stedolan.github.io/jq/) and `--format JSON` - you can do just that!

```sh
linkinator https://jbeckwith.com --verbosity DEBUG --format JSON | jq '.links | .[] | select(.state | contains("BROKEN"))'
```

## License

[MIT](LICENSE)
[MIT](LICENSE.md)
9 changes: 9 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Security Policy

## Supported Versions

The only version of this library that receives regular patches, or updated dependencies is the latest semver major version (2.x). This is free open source software I build in my spare time, so no promises on, well, you know, anything.

## Reporting a Vulnerability

To report a vulnerability, please email me: justin.beckwith@gmail.com.
7,570 changes: 7,570 additions & 0 deletions package-lock.json

Large diffs are not rendered by default.

50 changes: 26 additions & 24 deletions package.json
Original file line number Diff line number Diff line change
@@ -4,6 +4,7 @@
"version": "0.0.0",
"license": "MIT",
"repository": "JustinBeckwith/linkinator",
"author": "Justin Beckwith",
"main": "build/src/index.js",
"types": "build/src/index.d.ts",
"bin": {
@@ -12,44 +13,50 @@
"scripts": {
"pretest": "npm run compile",
"prepare": "npm run compile",
"codecov": "c8 report --reporter=json && codecov -f coverage/*.json",
"compile": "tsc -p .",
"test": "c8 mocha build/test",
"fix": "gts fix",
"codecov": "c8 report --reporter=json && codecov -f coverage/*.json",
"lint": "gts check"
"lint": "gts lint",
"build-binaries": "pkg . --out-path build/binaries",
"docs-test": "node build/src/cli.js ./README.md"
},
"dependencies": {
"chalk": "^4.0.0",
"cheerio": "^1.0.0-rc.2",
"finalhandler": "^1.1.2",
"escape-html": "^1.0.3",
"gaxios": "^4.0.0",
"glob": "^7.1.6",
"htmlparser2": "^7.1.2",
"jsonexport": "^3.0.0",
"meow": "^8.0.0",
"p-queue": "^6.2.1",
"serve-static": "^1.14.1",
"marked": "^2.0.0",
"meow": "^9.0.0",
"mime": "^2.5.0",
"server-destroy": "^1.0.1",
"update-notifier": "^5.0.0"
},
"devDependencies": {
"@types/chai": "^4.2.7",
"@types/cheerio": "^0.22.10",
"@types/finalhandler": "^1.1.0",
"@types/escape-html": "^1.0.0",
"@types/glob": "^7.1.3",
"@types/marked": "^2.0.0",
"@types/meow": "^5.0.0",
"@types/mocha": "^8.0.0",
"@types/node": "^12.7.12",
"@types/serve-static": "^1.13.3",
"@types/mime": "^2.0.3",
"@types/mocha": "^9.0.0",
"@types/node": "^14.0.0",
"@types/server-destroy": "^1.0.0",
"@types/sinon": "^9.0.0",
"@types/sinon": "^10.0.0",
"@types/update-notifier": "^5.0.0",
"c8": "^7.0.0",
"chai": "^4.2.0",
"codecov": "^3.6.1",
"execa": "^4.0.0",
"codecov": "^3.8.1",
"execa": "^5.0.0",
"gts": "^3.0.0",
"mocha": "^8.0.0",
"mocha": "^9.0.0",
"nock": "^13.0.0",
"semantic-release": "^17.0.0",
"sinon": "^9.0.0",
"pkg": "^5.0.0",
"semantic-release": "^18.0.0",
"sinon": "^11.0.0",
"strip-ansi": "^6.0.0",
"typescript": "^4.0.0"
},
"engines": {
@@ -69,10 +76,5 @@
"broken",
"link",
"checker"
],
"c8": {
"exclude": [
"build/test"
]
}
]
}
2 changes: 1 addition & 1 deletion renovate.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"extends": [
"config:base",
"docker:disable"
":disableDependencyDashboard"
],
"pinVersions": false,
"rebaseStalePrs": true
237 changes: 180 additions & 57 deletions src/cli.ts
Original file line number Diff line number Diff line change
@@ -3,9 +3,16 @@
import * as meow from 'meow';
import * as updateNotifier from 'update-notifier';
import chalk = require('chalk');
import {LinkChecker, LinkState, LinkResult, CheckOptions} from './index';
import {
LinkChecker,
LinkState,
LinkResult,
CheckOptions,
RetryInfo,
} from './index';
import {promisify} from 'util';
import {Flags, getConfig} from './config';
import {Format, Logger, LogLevel} from './logger';

// eslint-disable-next-line @typescript-eslint/no-var-requires
const toCSV = promisify(require('jsonexport'));
@@ -14,6 +21,8 @@ const toCSV = promisify(require('jsonexport'));
const pkg = require('../../package.json');
updateNotifier({pkg}).notify();

/* eslint-disable no-process-exit */

const cli = meow(
`
Usage
@@ -22,32 +31,55 @@ const cli = meow(
Positional arguments
LOCATION
Required. Either the URL or the path on disk to check for broken links.
Required. Either the URLs or the paths on disk to check for broken links.
Flags
--concurrency
The number of connections to make simultaneously. Defaults to 100.
--config
Path to the config file to use. Looks for \`linkinator.config.json\` by default.
--concurrency
The number of connections to make simultaneously. Defaults to 100.
--directory-listing
Include an automatic directory index file when linking to a directory.
Defaults to 'false'.
--format, -f
Return the data in CSV or JSON format.
--help
Show this command.
--markdown
Automatically parse and scan markdown if scanning from a location on disk.
--recurse, -r
Recursively follow links on the same root domain.
--skip, -s
List of urls in regexy form to not include in the check.
--retry,
Automatically retry requests that return HTTP 429 responses and include
a 'retry-after' header. Defaults to false.
--format, -f
Return the data in CSV or JSON format.
--server-root
When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in [LOCATION].
--silent
Only output broken links
--skip, -s
List of urls in regexy form to not include in the check.
--timeout
Request timeout in ms. Defaults to 0 (no timeout).
--help
Show this command.
--url-rewrite-search
Pattern to search for in urls. Must be used with --url-rewrite-replace.
--url-rewrite-replace
Expression used to replace search content. Must be used with --url-rewrite-search.
--verbosity
Override the default verbosity for this command. Available options are
'debug', 'info', 'warning', 'error', and 'none'. Defaults to 'warning'.
Examples
$ linkinator docs/
@@ -65,6 +97,13 @@ const cli = meow(
format: {type: 'string', alias: 'f'},
silent: {type: 'boolean'},
timeout: {type: 'number'},
markdown: {type: 'boolean'},
serverRoot: {type: 'string'},
verbosity: {type: 'string'},
directoryListing: {type: 'boolean'},
retry: {type: 'boolean'},
urlRewriteSearch: {type: 'string'},
urlReWriteReplace: {type: 'string'},
},
booleanDefault: undefined,
}
@@ -73,136 +112,220 @@ const cli = meow(
let flags: Flags;

async function main() {
if (cli.input.length !== 1) {
if (cli.input.length < 1) {
cli.showHelp();
return;
}
flags = await getConfig(cli.flags);
if (
(flags.urlRewriteReplace && !flags.urlRewriteSearch) ||
(flags.urlRewriteSearch && !flags.urlRewriteReplace)
) {
throw new Error(
'The url-rewrite-replace flag must be used with the url-rewrite-search flag.'
);
}

const start = Date.now();
const verbosity = parseVerbosity(flags);
const format = parseFormat(flags);
const logger = new Logger(verbosity, format);

logger.error(`🏊‍♂️ crawling ${cli.input}`);

if (!flags.silent) {
log(`🏊‍♂️ crawling ${cli.input}`);
}
const checker = new LinkChecker();
// checker.on('pagestart', url => {
// if (!flags.silent) {
// log(`\n Scanning ${chalk.grey(url)}`);
// }
// });
checker.on('retry', (info: RetryInfo) => {
logger.warn(`Retrying: ${info.url} in ${info.secondsUntilRetry} seconds.`);
});
checker.on('link', (link: LinkResult) => {
if (flags.silent && link.state !== LinkState.BROKEN) {
return;
}

let state = '';
switch (link.state) {
case LinkState.BROKEN:
state = `[${chalk.red(link.status!.toString())}]`;
logger.error(`${state} ${chalk.gray(link.url)}`);
break;
case LinkState.OK:
state = `[${chalk.green(link.status!.toString())}]`;
logger.warn(`${state} ${chalk.gray(link.url)}`);
break;
case LinkState.SKIPPED:
state = `[${chalk.grey('SKP')}]`;
logger.info(`${state} ${chalk.gray(link.url)}`);
break;
default:
throw new Error('Invalid state.');
}
log(`${state} ${chalk.gray(link.url)}`);
});
const opts: CheckOptions = {
path: cli.input[0],
path: cli.input,
recurse: flags.recurse,
timeout: Number(flags.timeout),
markdown: flags.markdown,
concurrency: Number(flags.concurrency),
serverRoot: flags.serverRoot,
directoryListing: flags.directoryListing,
retry: flags.retry,
};
if (flags.skip) {
if (typeof flags.skip === 'string') {
opts.linksToSkip = flags.skip.split(' ').filter(x => !!x);
opts.linksToSkip = flags.skip.split(/[\s,]+/).filter(x => !!x);
} else if (Array.isArray(flags.skip)) {
opts.linksToSkip = flags.skip;
}
}
if (flags.urlRewriteSearch && flags.urlRewriteReplace) {
opts.urlRewriteExpressions = [
{
pattern: new RegExp(flags.urlRewriteSearch),
replacement: flags.urlRewriteReplace,
},
];
}
const result = await checker.check(opts);
log();

const format = flags.format ? flags.format.toLowerCase() : null;
if (format === 'json') {
const filteredResults = result.links.filter(link => {
switch (link.state) {
case LinkState.OK:
return verbosity <= LogLevel.WARNING;
case LinkState.BROKEN:
if (verbosity > LogLevel.DEBUG) {
link.failureDetails = undefined;
}
return verbosity <= LogLevel.ERROR;
case LinkState.SKIPPED:
return verbosity <= LogLevel.INFO;
}
});
if (format === Format.JSON) {
result.links = filteredResults;
console.log(JSON.stringify(result, null, 2));
return;
} else if (format === 'csv') {
} else if (format === Format.CSV) {
result.links = filteredResults;
const csv = await toCSV(result.links);
console.log(csv);
return;
} else {
// Build a collection scanned links, collated by the parent link used in
// the scan. For example:
// {
// "./README.md": [
// {
// url: "https://img.shields.io/npm/v/linkinator.svg",
// status: 200
// ....
// }
// ],
// }
const parents = result.links.reduce((acc, curr) => {
if (!flags.silent || curr.state === LinkState.BROKEN) {
const parent = curr.parent || '';
if (!acc[parent]) {
acc[parent] = [];
}
acc[parent].push(curr);
const parent = curr.parent || '';
if (!acc[parent]) {
acc[parent] = [];
}
acc[parent].push(curr);
return acc;
}, {} as {[index: string]: LinkResult[]});

Object.keys(parents).forEach(parent => {
const links = parents[parent];
log(chalk.blue(parent));
links.forEach(link => {
if (flags.silent && link.state !== LinkState.BROKEN) {
return;
// prune links based on verbosity
const links = parents[parent].filter(link => {
if (verbosity === LogLevel.NONE) {
return false;
}
if (link.state === LinkState.BROKEN) {
return true;
}
if (link.state === LinkState.OK) {
if (verbosity <= LogLevel.WARNING) {
return true;
}
}
if (link.state === LinkState.SKIPPED) {
if (verbosity <= LogLevel.INFO) {
return true;
}
}
return false;
});
if (links.length === 0) {
return;
}
logger.error(chalk.blue(parent));
links.forEach(link => {
let state = '';
switch (link.state) {
case LinkState.BROKEN:
state = `[${chalk.red(link.status!.toString())}]`;
logger.error(` ${state} ${chalk.gray(link.url)}`);
logger.debug(JSON.stringify(link.failureDetails, null, 2));
break;
case LinkState.OK:
state = `[${chalk.green(link.status!.toString())}]`;
logger.warn(` ${state} ${chalk.gray(link.url)}`);
break;
case LinkState.SKIPPED:
state = `[${chalk.grey('SKP')}]`;
logger.info(` ${state} ${chalk.gray(link.url)}`);
break;
default:
throw new Error('Invalid state.');
}
log(` ${state} ${chalk.gray(link.url)}`);
});
});
}

const total = (Date.now() - start) / 1000;

const scannedLinks = result.links.filter(x => x.state !== LinkState.SKIPPED);
if (!result.passed) {
const borked = result.links.filter(x => x.state === LinkState.BROKEN);
console.error(
logger.error(
chalk.bold(
`${chalk.red('ERROR')}: Detected ${
borked.length
} broken links. Scanned ${chalk.yellow(
result.links.length.toString()
scannedLinks.length.toString()
)} links in ${chalk.cyan(total.toString())} seconds.`
)
);
// eslint-disable-next-line no-process-exit
process.exit(1);
}

log(
logger.error(
chalk.bold(
`🤖 Successfully scanned ${chalk.green(
result.links.length.toString()
scannedLinks.length.toString()
)} links in ${chalk.cyan(total.toString())} seconds.`
)
);
}

function log(message = '\n') {
function parseVerbosity(flags: Flags): LogLevel {
if (flags.silent && flags.verbosity) {
throw new Error(
'The SILENT and VERBOSITY flags cannot both be defined. Please consider using VERBOSITY only.'
);
}
if (flags.silent) {
return LogLevel.ERROR;
}
if (!flags.verbosity) {
return LogLevel.WARNING;
}
const verbosity = flags.verbosity.toUpperCase();
const options = Object.values(LogLevel);
if (!options.includes(verbosity)) {
throw new Error(
`Invalid flag: VERBOSITY must be one of [${options.join(',')}]`
);
}
return LogLevel[verbosity as keyof typeof LogLevel];
}

function parseFormat(flags: Flags): Format {
if (!flags.format) {
console.log(message);
return Format.TEXT;
}
flags.format = flags.format.toUpperCase();
const options = Object.values(Format);
if (!options.includes(flags.format)) {
throw new Error("Invalid flag: FORMAT must be 'TEXT', 'JSON', or 'CSV'.");
}
return Format[flags.format as keyof typeof Format];
}

main();
7 changes: 7 additions & 0 deletions src/config.ts
Original file line number Diff line number Diff line change
@@ -10,7 +10,14 @@ export interface Flags {
skip?: string;
format?: string;
silent?: boolean;
verbosity?: string;
timeout?: number;
markdown?: boolean;
serverRoot?: string;
directoryListing?: boolean;
retry?: boolean;
urlRewriteSearch?: string;
urlRewriteReplace?: string;
}

export async function getConfig(flags: Flags) {
324 changes: 238 additions & 86 deletions src/index.ts

Large diffs are not rendered by default.

75 changes: 50 additions & 25 deletions src/links.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import * as cheerio from 'cheerio';
import * as htmlParser from 'htmlparser2/lib/WritableStream';
import {Readable} from 'stream';
import {URL} from 'url';

const linksAttr = {
@@ -9,6 +10,7 @@ const linksAttr = {
icon: ['command'],
longdesc: ['frame', 'iframe'],
manifest: ['html'],
content: ['meta'],
poster: ['video'],
pluginspage: ['embed'],
pluginurl: ['embed'],
@@ -26,44 +28,67 @@ const linksAttr = {
],
srcset: ['img', 'source'],
} as {[index: string]: string[]};
// Create lookup table for tag name to attribute that contains URL:
const tagAttr: {[index: string]: string[]} = {};
Object.keys(linksAttr).forEach(attr => {
for (const tag of linksAttr[attr]) {
if (!tagAttr[tag]) tagAttr[tag] = [];
tagAttr[tag].push(attr);
}
});

export interface ParsedUrl {
link: string;
error?: Error;
url?: URL;
}

export function getLinks(source: string, baseUrl: string): ParsedUrl[] {
const $ = cheerio.load(source);
export async function getLinks(
source: Readable,
baseUrl: string
): Promise<ParsedUrl[]> {
let realBaseUrl = baseUrl;
const base = $('base[href]');
if (base.length) {
// only first <base by specification
const htmlBaseUrl = base.first().attr('href')!;
realBaseUrl = getBaseUrl(htmlBaseUrl, baseUrl);
}
let baseSet = false;
const links = new Array<ParsedUrl>();
const attrs = Object.keys(linksAttr);
for (const attr of attrs) {
const elements = linksAttr[attr].map(tag => `${tag}[${attr}]`).join(',');
$(elements).each((i, element) => {
const values = parseAttr(attr, element.attribs[attr]);
const parser = new htmlParser.WritableStream({
onopentag(tag: string, attributes: {[s: string]: string}) {
// Allow alternate base URL to be specified in tag:
if (tag === 'base' && !baseSet) {
realBaseUrl = getBaseUrl(attributes.href, baseUrl);
baseSet = true;
}

// ignore href properties for link tags where rel is likely to fail
const relValuesToIgnore = ['dns-prefetch', 'preconnect'];
if (
element.tagName === 'link' &&
relValuesToIgnore.includes(element.attribs['rel'])
) {
if (tag === 'link' && relValuesToIgnore.includes(attributes.rel)) {
return;
}
for (const v of values) {
if (v) {
const link = parseLink(v, realBaseUrl);
links.push(link);

// Only for <meta content=""> tags, only validate the url if
// the content actually looks like a url
if (tag === 'meta' && attributes.content) {
try {
new URL(attributes.content);
} catch (e) {
return;
}
}
});
}

if (tagAttr[tag]) {
for (const attr of tagAttr[tag]) {
const linkStr = attributes[attr];
if (linkStr) {
for (const link of parseAttr(attr, linkStr)) {
links.push(parseLink(link, realBaseUrl));
}
}
}
}
},
});
await new Promise((resolve, reject) => {
source.pipe(parser).on('finish', resolve).on('error', reject);
});
return links;
}

@@ -104,6 +129,6 @@ function parseLink(link: string, baseUrl: string): ParsedUrl {
url.hash = '';
return {link, url};
} catch (error) {
return {link, error};
return {link, error: error as Error};
}
}
49 changes: 49 additions & 0 deletions src/logger.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
export enum LogLevel {
DEBUG = 0,
INFO = 1,
WARNING = 2,
ERROR = 3,
NONE = 4,
}

export enum Format {
TEXT,
JSON,
CSV,
}

export class Logger {
public level: LogLevel;
public format: Format;

constructor(level: LogLevel, format: Format) {
this.level = level;
this.format = format;
}

debug(message?: string) {
if (this.level <= LogLevel.DEBUG && this.format === Format.TEXT) {
console.debug(message);
}
}

info(message?: string) {
if (this.level <= LogLevel.INFO && this.format === Format.TEXT) {
console.info(message);
}
}

warn(message?: string) {
if (this.level <= LogLevel.WARNING && this.format === Format.TEXT) {
// note: this is `console.log` on purpose. `console.warn` maps to
// `console.error`, which would print these messages to stderr.
console.log(message);
}
}

error(message?: string) {
if (this.level <= LogLevel.ERROR && this.format === Format.TEXT) {
console.error(message);
}
}
}
149 changes: 149 additions & 0 deletions src/options.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
import * as fs from 'fs';
import * as util from 'util';
import * as path from 'path';
import * as globby from 'glob';

const stat = util.promisify(fs.stat);
const glob = util.promisify(globby);

export interface UrlRewriteExpression {
pattern: RegExp;
replacement: string;
}

export interface CheckOptions {
concurrency?: number;
port?: number;
path: string | string[];
recurse?: boolean;
timeout?: number;
markdown?: boolean;
linksToSkip?: string[] | ((link: string) => Promise<boolean>);
serverRoot?: string;
directoryListing?: boolean;
retry?: boolean;
urlRewriteExpressions?: UrlRewriteExpression[];
}

export interface InternalCheckOptions extends CheckOptions {
syntheticServerRoot?: string;
staticHttpServerHost?: string;
}

/**
* Validate the provided flags all work with each other.
* @param options CheckOptions passed in from the CLI (or API)
*/
export async function processOptions(
opts: CheckOptions
): Promise<InternalCheckOptions> {
const options = Object.assign({}, opts) as InternalCheckOptions;

// ensure at least one path is provided
if (options.path.length === 0) {
throw new Error('At least one path must be provided');
}

// normalize options.path to an array of strings
if (!Array.isArray(options.path)) {
options.path = [options.path];
}

// disable directory listings by default
if (options.directoryListing === undefined) {
options.directoryListing = false;
}

// Ensure we do not mix http:// and file system paths. The paths passed in
// must all be filesystem paths, or HTTP paths.
let isUrlType: boolean | undefined = undefined;
for (const path of options.path) {
const innerIsUrlType = path.startsWith('http');
if (isUrlType === undefined) {
isUrlType = innerIsUrlType;
} else if (innerIsUrlType !== isUrlType) {
throw new Error(
'Paths cannot be mixed between HTTP and local filesystem paths.'
);
}
}

// if there is a server root, make sure there are no HTTP paths
if (options.serverRoot && isUrlType) {
throw new Error(
"'serverRoot' cannot be defined when the 'path' points to an HTTP endpoint."
);
}

if (options.serverRoot) {
options.serverRoot = path.normalize(options.serverRoot);
}

// expand globs into paths
if (!isUrlType) {
const paths: string[] = [];
for (const filePath of options.path) {
// The glob path provided is relative to the serverRoot. For example,
// if the serverRoot is test/fixtures/nested, and the glob is "*/*.html",
// The glob needs to be calculated from the serverRoot directory.
const fullPath = options.serverRoot
? path.join(options.serverRoot, filePath)
: filePath;
const expandedPaths = await glob(fullPath);
if (expandedPaths.length === 0) {
throw new Error(
`The provided glob "${filePath}" returned 0 results. The current working directory is "${process.cwd()}".`
);
}
// After resolving the globs, the paths need to be returned to their
// original form, without the serverRoot included in the path.
for (let p of expandedPaths) {
p = path.normalize(p);
if (options.serverRoot) {
const contractedPath = p
.split(path.sep)
.slice(options.serverRoot.split(path.sep).length)
.join(path.sep);
paths.push(contractedPath);
} else {
paths.push(p);
}
}
}
options.path = paths;
}

// enable markdown if someone passes a flag/glob right at it
if (options.markdown === undefined) {
for (const p of options.path) {
if (path.extname(p).toLowerCase() === '.md') {
options.markdown = true;
}
}
}

// Figure out which directory should be used as the root for the web server,
// and how that impacts the path to the file for the first request.
if (!options.serverRoot && !isUrlType) {
// if the serverRoot wasn't defined, and there are multiple paths, just
// use process.cwd().
if (options.path.length > 1) {
options.serverRoot = process.cwd();
} else {
// if there's a single path, try to be smart and figure it out
const s = await stat(options.path[0]);
options.serverRoot = options.path[0];
if (s.isFile()) {
const pathParts = options.path[0].split(path.sep);
options.path = [path.join('.', pathParts[pathParts.length - 1])];
options.serverRoot =
pathParts.slice(0, pathParts.length - 1).join(path.sep) || '.';
} else {
options.serverRoot = options.path[0];
options.path = '/';
}
options.syntheticServerRoot = options.serverRoot;
}
}
return options;
}
75 changes: 75 additions & 0 deletions src/queue.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
import {EventEmitter} from 'events';

export interface QueueOptions {
concurrency: number;
}

export interface QueueItemOptions {
delay?: number;
}

interface QueueItem {
fn: AsyncFunction;
timeToRun: number;
}

export declare interface Queue {
on(event: 'done', listener: () => void): this;
}

export type AsyncFunction = () => Promise<void>;

export class Queue extends EventEmitter {
private q: Array<QueueItem> = [];
private activeFunctions = 0;
private concurrency: number;

constructor(options: QueueOptions) {
super();
this.concurrency = options.concurrency;
}

add(fn: AsyncFunction, options?: QueueItemOptions) {
const delay = options?.delay || 0;
const timeToRun = Date.now() + delay;
this.q.push({
fn,
timeToRun,
});
setTimeout(() => this.tick(), delay);
}

private tick() {
// Check if we're complete
if (this.activeFunctions === 0 && this.q.length === 0) {
this.emit('done');
return;
}

for (let i = 0; i < this.q.length; i++) {
// Check if we have too many concurrent functions executing
if (this.activeFunctions >= this.concurrency) {
return;
}
// grab the element at the front of the array
const item = this.q.shift()!;
// make sure this element is ready to execute - if not, to the back of the stack
if (item.timeToRun > Date.now()) {
this.q.push(item);
} else {
// this function is ready to go!
this.activeFunctions++;
item.fn().finally(() => {
this.activeFunctions--;
this.tick();
});
}
}
}

async onIdle() {
return new Promise<void>(resolve => {
this.on('done', () => resolve());
});
}
}
123 changes: 123 additions & 0 deletions src/server.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
import {AddressInfo} from 'net';
import * as http from 'http';
import * as path from 'path';
import * as fs from 'fs';
import {promisify} from 'util';
import * as marked from 'marked';
import * as mime from 'mime';
import {URL} from 'url';
import escape = require('escape-html');
import enableDestroy = require('server-destroy');

const readFile = promisify(fs.readFile);
const stat = promisify(fs.stat);
const readdir = promisify(fs.readdir);

export interface WebServerOptions {
// The local path that should be mounted as a static web server
root: string;
// The port on which to start the local web server
port?: number;
// If markdown should be automatically compiled and served
markdown?: boolean;
// Should directories automatically serve an inde page
directoryListing?: boolean;
}

/**
* Spin up a local HTTP server to serve static requests from disk
* @private
* @returns Promise that resolves with the instance of the HTTP server
*/
export async function startWebServer(options: WebServerOptions) {
const root = path.resolve(options.root);
return new Promise<http.Server>((resolve, reject) => {
const server = http
.createServer((req, res) => handleRequest(req, res, root, options))
.listen(options.port || 0, () => resolve(server))
.on('error', reject);
if (!options.port) {
const addr = server.address() as AddressInfo;
options.port = addr.port;
}
enableDestroy(server);
});
}

async function handleRequest(
req: http.IncomingMessage,
res: http.ServerResponse,
root: string,
options: WebServerOptions
) {
const url = new URL(req.url || '/', `http://localhost:${options.port}`);
const pathParts = url.pathname.split('/').filter(x => !!x);
const originalPath = path.join(root, ...pathParts);
if (url.pathname.endsWith('/')) {
pathParts.push('index.html');
}
const localPath = path.join(root, ...pathParts);
if (!localPath.startsWith(root)) {
res.writeHead(500);
res.end();
return;
}
const maybeListing =
options.directoryListing && localPath.endsWith(`${path.sep}index.html`);

try {
const stats = await stat(localPath);
const isDirectory = stats.isDirectory();
if (isDirectory) {
// this means we got a path with no / at the end!
const doc = "<html><body>Redirectin'</body></html>";
res.statusCode = 301;
res.setHeader('Content-Type', 'text/html; charset=UTF-8');
res.setHeader('Content-Length', Buffer.byteLength(doc));
res.setHeader('Location', req.url + '/');
res.end(doc);
return;
}
} catch (err) {
if (!maybeListing) {
return return404(res, err);
}
}

try {
let data = await readFile(localPath, {encoding: 'utf8'});
let mimeType = mime.getType(localPath);
const isMarkdown = req.url?.toLocaleLowerCase().endsWith('.md');
if (isMarkdown && options.markdown) {
data = marked(data, {gfm: true});
mimeType = 'text/html; charset=UTF-8';
}
res.setHeader('Content-Type', mimeType!);
res.setHeader('Content-Length', Buffer.byteLength(data));
res.writeHead(200);
res.end(data);
} catch (err) {
if (maybeListing) {
try {
const files = await readdir(originalPath);
const fileList = files
.filter(f => escape(f))
.map(f => `<li><a href="${f}">${f}</a></li>`)
.join('\r\n');
const data = `<html><body><ul>${fileList}</ul></body></html>`;
res.writeHead(200);
res.end(data);
return;
} catch (err) {
return return404(res, err);
}
} else {
return return404(res, err);
}
}
}

function return404(res: http.ServerResponse, err: Error) {
res.writeHead(404);
res.end(JSON.stringify(err));
}
3 changes: 2 additions & 1 deletion test/fixtures/config/linkinator.config.json
Original file line number Diff line number Diff line change
@@ -3,5 +3,6 @@
"recurse": true,
"silent": true,
"concurrency": 17,
"skip": "🌳"
"skip": "🌳",
"directoryListing": false
}
6 changes: 6 additions & 0 deletions test/fixtures/config/skip-array-config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"skip": [
"fake.local",
"fake2.local"
]
}
1 change: 1 addition & 0 deletions test/fixtures/directoryIndex/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This has links to a [directory with files](dir1/) and an [empty directory](dir2/).
1 change: 1 addition & 0 deletions test/fixtures/directoryIndex/dir1/dir1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
👋
Empty file.
5 changes: 5 additions & 0 deletions test/fixtures/local/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
<a href="page2.html">just follow a link</a>
</body>
</html>
5 changes: 5 additions & 0 deletions test/fixtures/local/page2.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
nothing to see here
</body>
</html>
21 changes: 21 additions & 0 deletions test/fixtures/markdown/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) Justin Beckwith <justin.beckwith@gmail.com> (jbeckwith.com)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
6 changes: 6 additions & 0 deletions test/fixtures/markdown/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Say hello to my README
This has [a link](LICENSE.md) to something.

Also here is my cat.
![booboobadkitteh](boo.jpg)

Binary file added test/fixtures/markdown/boo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions test/fixtures/markdown/deep/deep.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# i am in a folder
I just happen to be in a folder.

This has [a link](../LICENSE.md) to something.
4 changes: 4 additions & 0 deletions test/fixtures/markdown/unlinked.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# just hanging out
This is a markdown file that isn't directly linked into. Useful for testing globs.

This has [a link](LICENSE.md) to something.
5 changes: 5 additions & 0 deletions test/fixtures/nested/doll1/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
<a href="http://fake.local/doll1">just follow a link</a>
</body>
</html>
5 changes: 5 additions & 0 deletions test/fixtures/nested/doll2/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
<a href="http://fake.local/doll2">just follow a link</a>
</body>
</html>
7 changes: 7 additions & 0 deletions test/fixtures/retry/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<html>
<body>
<a href="http://fake.local/1">linky</a>
<a href="http://fake.local/3">linky</a>
<a href="subpage.html">subpage!</a>
</body>
</html>
5 changes: 5 additions & 0 deletions test/fixtures/retry/subpage.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
<a href="http://fake.local/2">linky</a>
</body>
</html>
5 changes: 5 additions & 0 deletions test/fixtures/retryCLI/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
<a href="http://localhost:3333">linky</a>
</body>
</html>
21 changes: 21 additions & 0 deletions test/fixtures/rewrite/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) Justin Beckwith <justin.beckwith@gmail.com> (jbeckwith.com)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
2 changes: 2 additions & 0 deletions test/fixtures/rewrite/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Say hello to my README
This has [a link](NOTLICENSE.md) to something.
5 changes: 5 additions & 0 deletions test/fixtures/server/5.0/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
hello
</body>
</html>
5 changes: 5 additions & 0 deletions test/fixtures/server/bag/bag.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
hello
</body>
</html>
5 changes: 5 additions & 0 deletions test/fixtures/server/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
hello
</body>
</html>
1 change: 1 addition & 0 deletions test/fixtures/server/script.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
alert('ohai');
5 changes: 5 additions & 0 deletions test/fixtures/server/test.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
hello
</body>
</html>
4 changes: 4 additions & 0 deletions test/fixtures/srcset/_site/bar.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<html>
<body>
</body>
</html>
4 changes: 4 additions & 0 deletions test/fixtures/srcset/_site/foo.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<html>
<body>
</body>
</html>
5 changes: 5 additions & 0 deletions test/fixtures/srcset/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<html>
<body>
<img srcset="_site/foo.html, _site/bar.html"></img>
</body>
</html>
11 changes: 11 additions & 0 deletions test/fixtures/twittercard/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<html>
<head>
<meta name="twitter:card" content="summary"></meta>
<meta name="twitter:creator" content="@justinbeckwith" />
<meta name="twitter:site" content="@justinbeckwith" />
<meta property="og:url" content="http://fake.local/" />
<meta property="og:title" content="A Twitter for My Sister" />
<meta property="og:description" content="In the early days, Twitter grew so quickly that it was almost impossible to add new features because engineers spent their time trying to keep the rocket ship from stalling." />
<meta property="og:image" content="http://fake.local" />
</head>
</html>
271 changes: 271 additions & 0 deletions test/test.cli.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
import {describe, it} from 'mocha';
import * as execa from 'execa';
import {assert} from 'chai';
import * as http from 'http';
import * as util from 'util';
import stripAnsi = require('strip-ansi');
import enableDestroy = require('server-destroy');
import {LinkResult, LinkState} from '../src/index';

// eslint-disable-next-line prefer-arrow-callback
describe('cli', function () {
let server: http.Server;
this.timeout(20_000);

const pkg = require('../../package.json');
const linkinator = pkg.bin.linkinator;
const node = 'node';

afterEach(async () => {
if (server) {
await util.promisify(server.destroy)();
}
});

it('should show output for failures', async () => {
const res = await execa(node, [linkinator, 'test/fixtures/basic'], {
reject: false,
});
assert.match(stripAnsi(res.stderr), /ERROR: Detected 1 broken links/);
});

it('should pass successful markdown scan', async () => {
const res = await execa(node, [
linkinator,
'test/fixtures/markdown/README.md',
]);
assert.match(res.stderr, /Successfully scanned/);
});

it('should allow multiple paths', async () => {
const res = await execa(node, [
linkinator,
'test/fixtures/markdown/unlinked.md',
'test/fixtures/markdown/README.md',
]);
assert.match(res.stderr, /Successfully scanned/);
});

it('should show help if no params are provided', async () => {
const res = await execa(node, [linkinator], {
reject: false,
});
assert.match(res.stdout, /\$ linkinator LOCATION \[ --arguments \]/);
});

it('should flag skipped links', async () => {
const res = await execa(node, [
linkinator,
'--verbosity',
'INFO',
'--skip',
'"LICENSE.md, unlinked.md"',
'test/fixtures/markdown/README.md',
]);
const stdout = stripAnsi(res.stdout);
const stderr = stripAnsi(res.stderr);
assert.match(stdout, /\[SKP\]/);
// make sure we don't report skipped links in the count
assert.match(stderr, /scanned 2 links/);
});

it('should provide CSV if asked nicely', async () => {
const res = await execa(node, [
linkinator,
'--format',
'csv',
'test/fixtures/markdown/README.md',
]);
assert.match(res.stdout, /README.md,200,OK,/);
});

it('should provide JSON if asked nicely', async () => {
const res = await execa(node, [
linkinator,
'--format',
'json',
'test/fixtures/markdown/README.md',
]);
const output = JSON.parse(res.stdout);
assert.ok(output.links);
});

it('should not show links if --silent', async () => {
const res = await execa(node, [
linkinator,
'--silent',
'test/fixtures/markdown/README.md',
]);
assert.notMatch(res.stdout, /\[/);
});

it('should not show 200 links if verbosity is ERROR with JSON', async () => {
const res = await execa(node, [
linkinator,
'--verbosity',
'ERROR',
'--format',
'JSON',
'test/fixtures/markdown/README.md',
]);
const links = JSON.parse(res.stdout).links as LinkResult[];
for (const link of links) {
assert.strictEqual(link.state, LinkState.BROKEN);
}
});

it('should accept a server-root', async () => {
const res = await execa(node, [
linkinator,
'--markdown',
'--server-root',
'test/fixtures/markdown',
'README.md',
]);
assert.match(res.stderr, /Successfully scanned/);
});

it('should accept globs', async () => {
const res = await execa(node, [
linkinator,
'test/fixtures/markdown/*.md',
'test/fixtures/markdown/**/*.md',
]);
assert.match(res.stderr, /Successfully scanned/);
});

it('should throw on invalid format', async () => {
const res = await execa(
node,
[linkinator, './README.md', '--format', 'LOL'],
{
reject: false,
}
);
assert.match(res.stderr, /FORMAT must be/);
});

it('should throw on invalid verbosity', async () => {
const res = await execa(
node,
[linkinator, './README.md', '--VERBOSITY', 'LOL'],
{
reject: false,
}
);
assert.match(res.stderr, /VERBOSITY must be/);
});

it('should throw when verbosity and silent are flagged', async () => {
const res = await execa(
node,
[linkinator, './README.md', '--verbosity', 'DEBUG', '--silent'],
{
reject: false,
}
);
assert.match(res.stderr, /The SILENT and VERBOSITY flags/);
});

it('should show no output for verbosity=NONE', async () => {
const res = await execa(
node,
[linkinator, 'test/fixtures/basic', '--verbosity', 'NONE'],
{
reject: false,
}
);
assert.strictEqual(res.exitCode, 1);
assert.strictEqual(res.stdout, '');
assert.strictEqual(res.stderr, '');
});

it('should show callstacks for verbosity=DEBUG', async () => {
const res = await execa(
node,
[linkinator, 'test/fixtures/basic', '--verbosity', 'DEBUG'],
{
reject: false,
}
);
assert.strictEqual(res.exitCode, 1);
assert.match(res.stdout, /reason: getaddrinfo/);
});

it('should allow passing a config', async () => {
const res = await execa(node, [
linkinator,
'test/fixtures/basic',
'--config',
'test/fixtures/config/skip-array-config.json',
]);
assert.strictEqual(res.exitCode, 0);
});

it('should fail if a url search is provided without a replacement', async () => {
const res = await execa(
node,
[linkinator, '--url-rewrite-search', 'boop', 'test/fixtures/basic'],
{
reject: false,
}
);
assert.strictEqual(res.exitCode, 1);
assert.match(res.stderr, /flag must be used/);
});

it('should fail if a url replacement is provided without a search', async () => {
const res = await execa(
node,
[linkinator, '--url-rewrite-replace', 'beep', 'test/fixtures/basic'],
{
reject: false,
}
);
assert.strictEqual(res.exitCode, 1);
assert.match(res.stderr, /flag must be used/);
});

it('should respect url rewrites', async () => {
const res = await execa(node, [
linkinator,
'--url-rewrite-search',
'NOTLICENSE.md',
'--url-rewrite-replace',
'LICENSE.md',
'test/fixtures/rewrite/README.md',
]);
assert.match(res.stderr, /Successfully scanned/);
});

it('should warn on retries', async () => {
// start a web server to return the 429
let requestCount = 0;
let firstRequestTime: number;
const port = 3333;
const delayMillis = 1000;
server = http.createServer((_, res) => {
if (requestCount === 0) {
res.writeHead(429, {
'retry-after': 1,
});
requestCount++;
firstRequestTime = Date.now();
} else {
assert.isAtLeast(Date.now(), firstRequestTime + delayMillis);
res.writeHead(200);
}
res.end();
});
enableDestroy(server);
await new Promise<void>(r => server.listen(port, r));

const res = await execa(node, [
linkinator,
'--retry',
'test/fixtures/retryCLI',
]);
assert.strictEqual(res.exitCode, 0);
assert.include(res.stdout, `Retrying: http://localhost:${port}`);
});
});
273 changes: 272 additions & 1 deletion test/test.ts → test/test.index.ts
Original file line number Diff line number Diff line change
@@ -1,17 +1,19 @@
import * as assert from 'assert';
import {assert as assetChai} from 'chai';
import * as gaxios from 'gaxios';
import * as nock from 'nock';
import * as sinon from 'sinon';
import * as path from 'path';
import {describe, it, afterEach} from 'mocha';

import {check, LinkState, LinkChecker} from '../src';
import {check, LinkState, LinkChecker, CheckOptions, headers} from '../src';

nock.disableNetConnect();
nock.enableNetConnect('localhost');

describe('linkinator', () => {
afterEach(() => {
sinon.restore();
nock.cleanAll();
});

@@ -280,4 +282,273 @@ describe('linkinator', () => {
});
assert.ok(!results.passed);
});

it('should handle markdown', async () => {
const results = await check({
path: 'test/fixtures/markdown/README.md',
markdown: true,
});
assert.strictEqual(results.links.length, 3);
assert.ok(results.passed);
});

it('should throw an error if you pass server-root and an http based path', async () => {
await assert.rejects(
check({
path: 'https://jbeckwith.com',
serverRoot: process.cwd(),
}),
/cannot be defined/
);
});

it('should allow overriding the server root', async () => {
const results = await check({
serverRoot: 'test/fixtures/markdown',
path: 'README.md',
});
assert.strictEqual(results.links.length, 3);
assert.ok(results.passed);
});

it('should accept multiple filesystem paths', async () => {
const scope = nock('http://fake.local').head('/').reply(200);
const results = await check({
path: ['test/fixtures/basic', 'test/fixtures/image'],
});
assert.strictEqual(results.passed, false);
assert.strictEqual(results.links.length, 6);
scope.done();
});

it('should not allow mixed local and remote paths', async () => {
await assert.rejects(
check({
path: ['https://jbeckwith.com', 'test/fixtures/basic'],
}),
/cannot be mixed/
);
});

it('should require at least one path', async () => {
await assert.rejects(
check({
path: [],
}),
/At least one/
);
});

it('should not pollute the original options after merge', async () => {
const options: CheckOptions = Object.freeze({path: 'test/fixtures/basic'});
const scope = nock('http://fake.local').head('/').reply(200);
const results = await check(options);
assert.ok(results.passed);
scope.done();
assert.strictEqual(options.serverRoot, undefined);
});

it('should accept multiple http paths', async () => {
const scopes = [
nock('http://fake.local')
.get('/')
.replyWithFile(200, 'test/fixtures/local/index.html', {
'Content-Type': 'text/html; charset=UTF-8',
}),
nock('http://fake.local')
.get('/page2.html')
.replyWithFile(200, 'test/fixtures/local/page2.html', {
'Content-Type': 'text/html; charset=UTF-8',
}),
nock('http://fake2.local')
.get('/')
.replyWithFile(200, 'test/fixtures/local/index.html', {
'Content-Type': 'text/html; charset=UTF-8',
}),
nock('http://fake2.local')
.get('/page2.html')
.replyWithFile(200, 'test/fixtures/local/page2.html', {
'Content-Type': 'text/html; charset=UTF-8',
}),
];
const results = await check({
path: ['http://fake.local', 'http://fake2.local'],
});
assert.ok(results.passed);
scopes.forEach(x => x.done());
});

it('should print debug information when the env var is set', async () => {
sinon.stub(process, 'env').value({
LINKINATOR_DEBUG: true,
});
const consoleSpy = sinon.stub(console, 'log');
const results = await check({
path: 'test/fixtures/markdown/README.md',
});
assert.ok(results.passed);
assert.ok(consoleSpy.calledOnce);
});

it('should respect globs', async () => {
const results = await check({
path: 'test/fixtures/markdown/**/*.md',
});
assert.ok(results.passed);
assert.strictEqual(results.links.length, 6);
const licenseLink = results.links.find(x => x.url.endsWith('LICENSE.md'));
assert.ok(licenseLink);
assert.strictEqual(licenseLink.url, 'test/fixtures/markdown/LICENSE.md');
});

it('should autoscan markdown if specifically in path', async () => {
const results = await check({
path: 'test/fixtures/markdown/README.md',
});
assert.ok(results.passed);
assert.strictEqual(results.links.length, 3);
});

it('should throw if a glob provides no paths to scan', async () => {
await assert.rejects(
check({
path: 'test/fixtures/basic/*.md',
}),
/returned 0 results/
);
});

it('should always send a human looking User-Agent', async () => {
const scopes = [
nock('http://fake.local')
.get('/', undefined, {reqheaders: headers})
.replyWithFile(200, 'test/fixtures/local/index.html', {
'Content-Type': 'text/html; charset=UTF-8',
}),
nock('http://fake.local')
.get('/page2.html', undefined, {reqheaders: headers})
.replyWithFile(200, 'test/fixtures/local/page2.html', {
'Content-Type': 'text/html; charset=UTF-8',
}),
];
const results = await check({
path: 'http://fake.local',
});
assert.ok(results.passed);
scopes.forEach(x => x.done());
});

it('should surface call stacks on failures in the API', async () => {
const results = await check({
path: 'http://fake.local',
});
assert.ok(!results.passed);
const err = results.links[0].failureDetails![0] as Error;
assetChai.match(err.message, /Nock: Disallowed net connect for/);
});

it('should respect server root with globs', async () => {
const scope = nock('http://fake.local')
.get('/doll1')
.reply(200)
.get('/doll2')
.reply(200);
const results = await check({
serverRoot: 'test/fixtures/nested',
path: '*/*.html',
});
assert.strictEqual(results.links.length, 4);
assert.ok(results.passed);
scope.done();
});

it('should respect absolute server root', async () => {
const scope = nock('http://fake.local')
.get('/doll1')
.reply(200)
.get('/doll2')
.reply(200);
const results = await check({
serverRoot: path.resolve('test/fixtures/nested'),
path: '*/*.html',
});
assert.strictEqual(results.links.length, 4);
assert.ok(results.passed);
scope.done();
});

it('should scan links in <meta content="URL"> tags', async () => {
const scope = nock('http://fake.local').head('/').reply(200);
const results = await check({path: 'test/fixtures/twittercard'});
assert.ok(results.passed);
scope.done();
assert.strictEqual(results.links.length, 2);
});

it('should support directory index', async () => {
const results = await check({
path: 'test/fixtures/directoryIndex/README.md',
directoryListing: true,
});
assert.ok(results.passed);
assert.strictEqual(results.links.length, 3);
});

it('should disabling directory index by default', async () => {
const results = await check({
path: 'test/fixtures/directoryIndex/README.md',
});
assert.ok(!results.passed);
assert.strictEqual(results.links.length, 3);
});

it('should provide a relative path in the results', async () => {
const scope = nock('http://fake.local').head('/').reply(200);
const results = await check({path: 'test/fixtures/basic'});
assert.strictEqual(results.links.length, 2);
const [rootLink, fakeLink] = results.links;
assert.strictEqual(rootLink.url, path.join('test', 'fixtures', 'basic'));
assert.strictEqual(fakeLink.url, 'http://fake.local/');
scope.done();
});

it('should provide a server root relative path in the results', async () => {
const scope = nock('http://fake.local').head('/').reply(200);
const results = await check({
path: '.',
serverRoot: 'test/fixtures/basic',
});
assert.strictEqual(results.links.length, 2);
const [rootLink, fakeLink] = results.links;
assert.strictEqual(rootLink.url, `.${path.sep}`);
assert.strictEqual(fakeLink.url, 'http://fake.local/');
scope.done();
});

it('should rewrite urls', async () => {
const results = await check({
path: 'test/fixtures/rewrite/README.md',
urlRewriteExpressions: [
{
pattern: /NOTLICENSE\.[a-z]+/,
replacement: 'LICENSE.md',
},
],
});
assert.ok(results.passed);
});

it('should report malformed links as broken', async () => {
const results = await check({path: 'test/fixtures/malformed'});
assert.ok(!results.passed);
assert.strictEqual(
results.links.filter(x => x.state === LinkState.BROKEN).length,
1
);
});

it('should handle comma separated srcset', async () => {
const results = await check({path: 'test/fixtures/srcset'});
assert.ok(results.passed);
});
});
213 changes: 213 additions & 0 deletions test/test.retry.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
import {assert} from 'chai';
import * as nock from 'nock';
import * as sinon from 'sinon';
import {describe, it, afterEach} from 'mocha';

import {check, LinkChecker} from '../src';

nock.disableNetConnect();
nock.enableNetConnect('localhost');

describe('retries', () => {
afterEach(() => {
sinon.restore();
nock.cleanAll();
});

it('should handle 429s with invalid retry-after headers', async () => {
const scope = nock('http://fake.local').get('/').reply(429, undefined, {
'retry-after': 'totally-not-valid',
});
const results = await check({
path: 'test/fixtures/basic',
retry: true,
});
assert.ok(!results.passed);
scope.done();
});

it('should retry 429s with second based header', async () => {
const scope = nock('http://fake.local')
.get('/')
.reply(429, undefined, {
'retry-after': '10',
})
.get('/')
.reply(200);

const {promise, resolve} = invertedPromise();
const checker = new LinkChecker().on('retry', resolve);
const clock = sinon.useFakeTimers({
shouldAdvanceTime: true,
});
const checkPromise = checker.check({
path: 'test/fixtures/basic',
retry: true,
});

await promise;
await clock.tickAsync(10_000);
const results = await checkPromise;
assert.ok(results.passed);
scope.done();
});

it('should retry 429s after failed HEAD', async () => {
const scope = nock('http://fake.local')
.head('/')
.reply(405)
.get('/')
.reply(429, undefined, {
'retry-after': '10',
})
.get('/')
.reply(200);

const {promise, resolve} = invertedPromise();
const checker = new LinkChecker().on('retry', resolve);
const clock = sinon.useFakeTimers({
shouldAdvanceTime: true,
});
const checkPromise = checker.check({
path: 'test/fixtures/basic',
retry: true,
});
await promise;
await clock.tickAsync(10000);
const results = await checkPromise;
assert.ok(results.passed);
scope.done();
});

it('should retry 429s with date based header', async () => {
const scope = nock('http://fake.local')
.get('/')
.reply(429, undefined, {
'retry-after': '1970-01-01T00:00:10.000Z',
})
.get('/')
.reply(200);

const {promise, resolve} = invertedPromise();
const checker = new LinkChecker().on('retry', resolve);
const clock = sinon.useFakeTimers({
shouldAdvanceTime: true,
});
const checkPromise = checker.check({
path: 'test/fixtures/basic',
retry: true,
});
await promise;
await clock.tickAsync(10000);
const results = await checkPromise;
assert.ok(results.passed);
scope.done();
});

it('should detect requests to wait on the same host', async () => {
const scope = nock('http://fake.local')
.get('/1')
.reply(429, undefined, {
'retry-after': '3',
})
.get('/1', () => {
assert.isAtLeast(Date.now(), 3000);
return true;
})
.reply(200)
.get('/2', () => {
assert.isAtLeast(Date.now(), 3000);
return true;
})
.reply(200)
.get('/3')
.reply(429, undefined, {
'retry-after': '3',
})
.get('/3', () => {
assert.isAtLeast(Date.now(), 3000);
return true;
})
.reply(200);

const {promise, resolve} = invertedPromise();
const checker = new LinkChecker().on('retry', resolve);
const clock = sinon.useFakeTimers({
shouldAdvanceTime: true,
});
const checkPromise = checker.check({
path: 'test/fixtures/retry',
recurse: true,
retry: true,
});
await promise;
await clock.tickAsync(3000);
const results = await checkPromise;
assert.ok(results.passed);
scope.done();
});

it('should increase timeout for followup requests to a host', async () => {
const scope = nock('http://fake.local')
.get('/1')
.reply(429, undefined, {
'retry-after': '3',
})
.get('/1', () => {
// even though the header said to wait 3 seconds, we are checking to
// make sure the /3 route reset it to 9 seconds here. This is common
// when a flood of requests come through and the retry-after gets
// extended.
assert.isAtLeast(Date.now(), 9000);
return true;
})
.reply(200)
.get('/2', () => {
assert.isAtLeast(Date.now(), 9000);
return true;
})
.reply(200)
.get('/3')
.reply(429, undefined, {
'retry-after': '9',
})
.get('/3', () => {
assert.isAtLeast(Date.now(), 9000);
return true;
})
.reply(200);

const {promise: p1, resolve: r1} = invertedPromise();
const {promise: p2, resolve: r2} = invertedPromise();
const checker = new LinkChecker().on('retry', info => {
if (info.url === 'http://fake.local/1') {
r1();
} else if (info.url === 'http://fake.local/3') {
r2();
}
});
const clock = sinon.useFakeTimers({
shouldAdvanceTime: true,
});
const checkPromise = checker.check({
path: 'test/fixtures/retry',
recurse: true,
retry: true,
});
await Promise.all([p1, p2]);
await clock.tickAsync(9000);
const results = await checkPromise;
assert.ok(results.passed);
scope.done();
});

function invertedPromise() {
let resolve!: () => void;
let reject!: (err: Error) => void;
const promise = new Promise<void>((innerResolve, innerReject) => {
resolve = innerResolve;
reject = innerReject;
});
return {promise, resolve, reject};
}
});
79 changes: 79 additions & 0 deletions test/test.server.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import * as assert from 'assert';
import {describe, it, before, after} from 'mocha';
import {startWebServer} from '../src/server';
import {AddressInfo} from 'net';
import {Server} from 'http';
import {request} from 'gaxios';
import * as fs from 'fs';

describe('server', () => {
let server: Server;
let rootUrl: string;
const contents = fs.readFileSync('test/fixtures/server/index.html', 'utf-8');
before(async () => {
server = await startWebServer({
directoryListing: true,
markdown: true,
root: 'test/fixtures/server',
});
const addr = server.address() as AddressInfo;
rootUrl = `http://localhost:${addr.port}`;
});
after(() => server.destroy());

it('should serve basic file', async () => {
const url = rootUrl;
const res = await request({url});
assert.strictEqual(res.data, contents);
const expectedContentType = 'text/html';
assert.strictEqual(res.headers['content-type'], expectedContentType);
});

it('should show a directory listing if asked nicely', async () => {
const url = `${rootUrl}/bag/`;
const res = await request({url});
const expected =
'<html><body><ul><li><a href="bag.html">bag.html</a></li></ul></body></html>';
assert.strictEqual(res.data, expected);
});

it('should serve correct mime type', async () => {
const url = `${rootUrl}/script.js`;
const res = await request({url});
const expectedContentType = 'application/javascript';
assert.strictEqual(res.headers['content-type'], expectedContentType);
});

it('should protect against path escape attacks', async () => {
const url = `${rootUrl}/../../etc/passwd`;
const res = await request({url, validateStatus: () => true});
assert.strictEqual(res.status, 404);
});

it('should return a 404 for missing paths', async () => {
const url = `${rootUrl}/does/not/exist`;
const res = await request({url, validateStatus: () => true});
assert.strictEqual(res.status, 404);
});

it('should work with directories with a .', async () => {
const url = `${rootUrl}/5.0/`;
const res = await request({url});
assert.strictEqual(res.status, 200);
assert.strictEqual(res.data, contents);
});

it('should ignore query strings', async () => {
const url = `${rootUrl}/index.html?a=b`;
const res = await request({url});
assert.strictEqual(res.status, 200);
assert.strictEqual(res.data, contents);
});

it('should ignore query strings in a directory', async () => {
const url = `${rootUrl}/?a=b`;
const res = await request({url});
assert.strictEqual(res.status, 200);
assert.strictEqual(res.data, contents);
});
});
13 changes: 0 additions & 13 deletions test/zcli.ts

This file was deleted.

3 changes: 1 addition & 2 deletions tsconfig.json
Original file line number Diff line number Diff line change
@@ -7,7 +7,6 @@
},
"include": [
"src/*.ts",
"test/*.ts",
"system-test/*.ts"
"test/*.ts"
]
}