Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: JustinBeckwith/linkinator
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: c915959529247b1177baf975a04a3bc0ef8cca72
Choose a base ref
...
head repository: JustinBeckwith/linkinator
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 5d347f0c9ae8a32cc21b885f1ffd2104c8d3ce3d
Choose a head ref

Commits on Nov 29, 2020

  1. feat: add basic support for markdown (#188)

    This adds a `--markdown` flag that allows for basic markdown link scanning.  When passing `--markdown` on the CLI or setting the `markdown: true` option, markdown in the local directory will be rendered as HTML and scanned.
    JustinBeckwith authored Nov 29, 2020

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    524f600 View commit details
  2. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    f4822f0 View commit details
  3. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    bae9d38 View commit details
  4. Copy the full SHA
    ce649d4 View commit details

Commits on Dec 1, 2020

  1. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    e70dff6 View commit details
  2. Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature.
    Copy the full SHA
    8913f87 View commit details

Commits on Dec 2, 2020

  1. 4
    Copy the full SHA
    429b325 View commit details

Commits on Dec 3, 2020

  1. Copy the full SHA
    b47f4b6 View commit details
  2. Copy the full SHA
    c40be4b View commit details
  3. chore(deps): update dependency execa to v5 (#201)

    Co-authored-by: Renovate Bot <bot@renovateapp.com>
    renovate[bot] and renovate-bot authored Dec 3, 2020
    Copy the full SHA
    c7fa9ad View commit details

Commits on Dec 6, 2020

  1. Copy the full SHA
    8d8472a View commit details

Commits on Dec 7, 2020

  1. Copy the full SHA
    0c8cd4b View commit details

Commits on Dec 21, 2020

  1. Copy the full SHA
    217074e View commit details

Commits on Dec 22, 2020

  1. Copy the full SHA
    7c84936 View commit details

Commits on Dec 24, 2020

  1. feat: add verbosity flag to CLI (#214)

    This adds a --verbosity flag, which defaults to WARNING. Skipped links now are hidden by default, unless verbosity is set to INFO or DEBUG.
    JustinBeckwith authored Dec 24, 2020
    Copy the full SHA
    d20cff5 View commit details
  2. Copy the full SHA
    cf29469 View commit details
  3. Copy the full SHA
    9eb5590 View commit details
  4. Copy the full SHA
    a8c0a43 View commit details

Commits on Dec 26, 2020

  1. Copy the full SHA
    fee112b View commit details
  2. Copy the full SHA
    679d64f View commit details

Commits on Dec 28, 2020

  1. Copy the full SHA
    c752724 View commit details
  2. Copy the full SHA
    6e49545 View commit details

Commits on Dec 29, 2020

  1. Copy the full SHA
    6f8d65a View commit details
  2. feat: support directory listings (#225)

    In addition to providing the directory listing flag, this swaps the underlying HTTP server from `serve-static` to `serve-handler`.  There should be no user facing changes for that swap.
    JustinBeckwith authored Dec 29, 2020
    Copy the full SHA
    39cf9d2 View commit details
  3. Copy the full SHA
    a7d8625 View commit details

Commits on Dec 30, 2020

  1. Copy the full SHA
    d7c4758 View commit details
  2. Copy the full SHA
    96a8750 View commit details
  3. Copy the full SHA
    936af89 View commit details

Commits on Jan 3, 2021

  1. feat: introduce retry-after detection (#221)

    This introduces a --retry flag, which when passed will automatically retry requests that comes back with a HTTP 429, and a retry-after header. I tested this against GitHub , and it appears to work as expected.
    JustinBeckwith authored Jan 3, 2021
    Copy the full SHA
    cebea21 View commit details

Commits on Jan 4, 2021

  1. Copy the full SHA
    1850490 View commit details

Commits on Jan 5, 2021

  1. fix: map paths in results back to filesystem (#231)

    Fixes #166.  This updates the returned paths in the results to map to the filesystem if a local path was given.
    JustinBeckwith authored Jan 5, 2021
    Copy the full SHA
    5f7bb18 View commit details

Commits on Jan 7, 2021

  1. fix(deps): update dependency meow to v9 (#232)

    Co-authored-by: Renovate Bot <bot@renovateapp.com>
    renovate[bot] and renovate-bot authored Jan 7, 2021
    Copy the full SHA
    6374c60 View commit details

Commits on Jan 10, 2021

  1. Copy the full SHA
    5d347f0 View commit details
14 changes: 7 additions & 7 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -10,14 +10,17 @@ jobs:
strategy:
fail-fast: false
matrix:
node: [10, 12, 14]
node: [10, 12, 14, 15]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: ${{ matrix.node }}
- run: npm install
- run: npm test
- uses: codecov/codecov-action@v1
with:
name: actions ${{ matrix.node }}
windows:
runs-on: windows-latest
steps:
@@ -36,19 +39,15 @@ jobs:
node-version: 12
- run: npm install
- run: npm run lint
coverage:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 12
- run: npm install
- run: npm test
- run: npm run codecov
- uses: codecov/codecov-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
- run: npm run docs-test
release:
if: github.ref == 'refs/heads/master'
runs-on: ubuntu-latest
@@ -60,6 +59,7 @@ jobs:
node-version: 12
- run: npm install
- run: npm run compile
- run: npm run build-binaries
- run: npx semantic-release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -3,3 +3,4 @@ package-lock.json
.nyc_output
build/
coverage
.vscode
5 changes: 3 additions & 2 deletions .mocharc.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"check-leaks": true,
"timeout": 10000,
"timeout": 2000,
"throw-deprecation": true,
"enable-source-maps": true
"enable-source-maps": true,
"exit": true
}
3 changes: 3 additions & 0 deletions .releaserc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"assets": "build/binaries/*"
}
138 changes: 112 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -2,65 +2,85 @@
> A super simple site crawler and broken link checker.
[![npm version](https://img.shields.io/npm/v/linkinator.svg)](https://www.npmjs.org/package/linkinator)
[![Build Status](https://api.cirrus-ci.com/github/JustinBeckwith/linkinator.svg)](https://cirrus-ci.com/github/JustinBeckwith/linkinator)
[![codecov](https://codecov.io/gh/JustinBeckwith/linkinator/branch/master/graph/badge.svg)](https://codecov.io/gh/JustinBeckwith/linkinator)
[![Dependency Status](https://img.shields.io/david/JustinBeckwith/linkinator.svg)](https://david-dm.org/JustinBeckwith/linkinator)
[![Known Vulnerabilities](https://snyk.io/test/github/JustinBeckwith/linkinator/badge.svg)](https://snyk.io/test/github/JustinBeckwith/linkinator)
[![Build Status](https://img.shields.io/github/workflow/status/JustinBeckwith/linkinator/ci/master)](https://github.com/JustinBeckwith/linkinator/actions?query=branch%3Amaster+workflow%3Aci)
[![codecov](https://img.shields.io/codecov/c/github/JustinBeckwith/linkinator/master)](https://codecov.io/gh/JustinBeckwith/linkinator)
[![Known Vulnerabilities](https://img.shields.io/snyk/vulnerabilities/github/JustinBeckwith/linkinator)](https://snyk.io/test/github/JustinBeckwith/linkinator)
[![Code Style: Google](https://img.shields.io/badge/code%20style-google-blueviolet.svg)](https://github.com/google/gts)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release)


Behold my latest inator! The `linkinator` provides an API and CLI for crawling websites and validating links. It's got a ton of sweet features:
- 🔥Easily perform scans on remote sites or local files
- 🔥Scan any element that includes links, not just `<a href>`
- 🔥Supports redirects, absolute links, relative links, all the things
- 🔥Configure specific regex patterns to skip
- 🔥 Easily perform scans on remote sites or local files
- 🔥 Scan any element that includes links, not just `<a href>`
- 🔥 Supports redirects, absolute links, relative links, all the things
- 🔥 Configure specific regex patterns to skip
- 🔥 Scan markdown files without transpilation

## Installation

```sh
$ npm install linkinator
```

Not into the whole node.js or npm thing? You can also download a standalone binary that bundles node, linkinator, and anything else you need. See [releases](https://github.com/JustinBeckwith/linkinator/releases).

## Command Usage

You can use this as a library, or as a CLI. Let's see the CLI!

```sh
$ linkinator LOCATION [ --arguments ]
```
$ linkinator LOCATIONS [ --arguments ]
Positional arguments
LOCATION
Required. Either the URL or the path on disk to check for broken links.
LOCATIONS
Required. Either the URLs or the paths on disk to check for broken links.
Supports multiple paths, and globs.
Flags
--concurrency
The number of connections to make simultaneously. Defaults to 100.
--config
Path to the config file to use. Looks for `linkinator.config.json` by default.
--concurrency
The number of connections to make simultaneously. Defaults to 100.
--directory-listing
Include an automatic directory index file when linking to a directory.
Defaults to 'false'.
--recurse, -r
Recursively follow links on the same root domain.
--format, -f
Return the data in CSV or JSON format.
--skip, -s
List of urls in regexy form to not include in the check.
--help
Show this command.
--include, -i
List of urls in regexy form to include. The opposite of --skip.
--format, -f
Return the data in CSV or JSON format.
--markdown
Automatically parse and scan markdown if scanning from a location on disk.
--recurse, -r
Recursively follow links on the same root domain.
--retry,
Automatically retry requests that return HTTP 429 responses and include
a 'retry-after' header. Defaults to false.
--silent
Only output broken links.
--server-root
When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in [LOCATION].
--skip, -s
List of urls in regexy form to not include in the check.
--timeout
Request timeout in ms. Defaults to 0 (no timeout).
--help
Show this command.
--verbosity
Override the default verbosity for this command. Available options are
'debug', 'info', 'warning', 'error', and 'none'. Defaults to 'warning'.
```

### Command Examples
@@ -101,6 +121,18 @@ Maybe you're going to pipe the output to another program. Use the `--format` op
$ linkinator ./docs --format CSV
```

Let's make sure the `README.md` in our repo doesn't have any busted links:

```sh
$ linkinator ./README.md --markdown
```

You know what, we better check all of the markdown files!

```sh
$ linkinator "**/*.md" --markdown
```

### Configuration file
You can pass options directly to the `linkinator` CLI, or you can define a config file. By default, `linkinator` will look for a `linkinator.config.json` file in the current working directory.

@@ -113,6 +145,8 @@ All options are optional. It should look like this:
"silent": true,
"concurrency": 100,
"timeout": 0,
"markdown": true,
"directoryListing": true,
"skip": "www.googleapis.com"
}
```
@@ -123,16 +157,43 @@ To load config settings outside the CWD, you can pass the `--config` flag to the
$ linkinator --config /some/path/your-config.json
```

## GitHub Actions
You can use `linkinator` as a GitHub Action as well, using [JustinBeckwith/linkinator-action](https://github.com/JustinBeckwith/linkinator-action):

```yaml
on:
push:
branches:
- main
pull_request:
name: ci
jobs:
linkinator:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: JustinBeckwith/linkinator-action@v1
with:
paths: README.md
```
To see all options or to learn more, visit [JustinBeckwith/linkinator-action](https://github.com/JustinBeckwith/linkinator-action).
## API Usage
#### linkinator.check(options)
Asynchronous method that runs a site wide scan. Options come in the form of an object that includes:
- `path` (string) - A fully qualified path to the url to be scanned, or the path to the directory on disk that contains files to be scanned. *required*.
- `path` (string|string[]) - A fully qualified path to the url to be scanned, or the path(s) to the directory on disk that contains files to be scanned. *required*.
- `concurrency` (number) - The number of connections to make simultaneously. Defaults to 100.
- `port` (number) - When the `path` is provided as a local path on disk, the `port` on which to start the temporary web server. Defaults to a random high range order port.
- `recurse` (boolean) - By default, all scans are shallow. Only the top level links on the requested page will be scanned. By setting `recurse` to `true`, the crawler will follow all links on the page, and continue scanning links **on the same domain** for as long as it can go. Results are cached, so no worries about loops.
- `retry` (boolean|RetryConfig) - Automatically retry requests that respond with an HTTP 429, and include a `retry-after` header. The `RetryConfig` option is a placeholder for fine-grained controls to be implemented at a later time, and is only included here to signal forward-compatibility.
- `serverRoot` (string) - When scanning a locally directory, customize the location on disk
where the server is started. Defaults to the path passed in `path`.
- `timeout` (number) - By default, requests made by linkinator do not time out (or follow the settings of the OS). This option (in milliseconds) will fail requests after the configured amount of time.
- `markdown` (boolean) - Automatically parse and scan markdown if scanning from a location on disk.
- `linksToSkip` (array | function) - An array of regular expression strings that should be skipped, OR an async function that's called for each link with the link URL as its only argument. Return a Promise that resolves to `true` to skip the link or `false` to check it.
- `directoryListing` (boolean) - Automatically serve a static file listing page when serving a directory. Defaults to `false`.

#### linkinator.LinkChecker()
Constructor method that can be used to create a new `LinkChecker` instance. This is particularly useful if you want to receive events as the crawler crawls. Exposes the following events:
@@ -235,6 +296,31 @@ async function complex() {
complex();
```

## Tips & Tricks

### Using a proxy
This library supports proxies via the `HTTP_PROXY` and `HTTPS_PROXY` environment variables. This [guide](https://www.golinuxcloud.com/set-up-proxy-http-proxy-environment-variable/) provides a nice overview of how to format and set these variables.

### Globbing
You may have noticed in the example, when using a glob the pattern is encapsulated in quotes:
```sh
$ linkinator "**/*.md" --markdown
```

Without the quotes, some shells will attempt to expand the glob paths on their own. Various shells (bash, zsh) have different, somewhat unpredictable behaviors when left to their own devices. Using the quotes ensures consistent, predictable behavior by letting the library expand the pattern.

### Debugging
Oftentimes when a link fails, it's an easy to spot typo, or a clear 404. Other times ... you may need more details on exactly what went wrong. To see a full call stack for the HTTP request failure, use `--verbosity DEBUG`:
```sh
$ linkinator https://jbeckwith.com --verbosity DEBUG
```

### Controlling Output
The `--verbosity` flag offers preset options for controlling the output, but you may want more control. Using [`jq`](https://stedolan.github.io/jq/) and `--format JSON` - you can do just that!
```sh
$ linkinator https://jbeckwith.com --verbosity DEBUG --format JSON | jq '.links | .[] | select(.state | contains("BROKEN"))'
```

## License

[MIT](LICENSE)
[MIT](LICENSE.md)
24 changes: 14 additions & 10 deletions package.json
Original file line number Diff line number Diff line change
@@ -16,38 +16,42 @@
"test": "c8 mocha build/test",
"fix": "gts fix",
"codecov": "c8 report --reporter=json && codecov -f coverage/*.json",
"lint": "gts check"
"lint": "gts lint",
"build-binaries": "pkg . --out-path build/binaries",
"docs-test": "npm link && linkinator ./README.md"
},
"dependencies": {
"chalk": "^4.0.0",
"cheerio": "^1.0.0-rc.2",
"finalhandler": "^1.1.2",
"cheerio": "^1.0.0-rc.5",
"gaxios": "^4.0.0",
"glob": "^7.1.6",
"jsonexport": "^3.0.0",
"meow": "^8.0.0",
"marked": "^1.2.5",
"meow": "^9.0.0",
"p-queue": "^6.2.1",
"serve-static": "^1.14.1",
"serve-handler": "^6.1.3",
"server-destroy": "^1.0.1",
"update-notifier": "^5.0.0"
},
"devDependencies": {
"@types/chai": "^4.2.7",
"@types/cheerio": "^0.22.10",
"@types/finalhandler": "^1.1.0",
"@types/cheerio": "0.22.23",
"@types/glob": "^7.1.3",
"@types/marked": "^1.2.0",
"@types/meow": "^5.0.0",
"@types/mocha": "^8.0.0",
"@types/node": "^12.7.12",
"@types/serve-static": "^1.13.3",
"@types/serve-handler": "^6.1.0",
"@types/server-destroy": "^1.0.0",
"@types/sinon": "^9.0.0",
"@types/update-notifier": "^5.0.0",
"c8": "^7.0.0",
"chai": "^4.2.0",
"codecov": "^3.6.1",
"execa": "^4.0.0",
"execa": "^5.0.0",
"gts": "^3.0.0",
"mocha": "^8.0.0",
"nock": "^13.0.0",
"pkg": "^4.4.9",
"semantic-release": "^17.0.0",
"sinon": "^9.0.0",
"typescript": "^4.0.0"
Loading