Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: JustinBeckwith/linkinator
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 5d347f0c9ae8a32cc21b885f1ffd2104c8d3ce3d
Choose a base ref
...
head repository: JustinBeckwith/linkinator
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: ea89b421e1ddabdfd00263800e85ef7a3a2020d8
Choose a head ref

Commits on Jan 10, 2021

  1. Copy the full SHA
    9ff9e9a View commit details
  2. build: disable package-lock.json properly (#236)

    The default setting is to always generate a package-lock.json file, so this step is required to prevent its generation.
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 10, 2021
    Copy the full SHA
    e6c539a View commit details
  3. docs: README.md Markdown tweaks (#238)

    * add newlines before/after code blocks and lists
    * remove dollar sign from snippets since it makes copying harder and the commands don't have any output
    * fix headings hierarchy
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 10, 2021
    Copy the full SHA
    105d783 View commit details
  4. build: add CodeQL scanning (#234)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 10, 2021
    Copy the full SHA
    b6ca492 View commit details
  5. build: update CI config (#235)

    * move Node.js version to an environment variable
    * update to `actions/setup-node@v2`
    XhmikosR authored Jan 10, 2021
    Copy the full SHA
    4f12838 View commit details

Commits on Jan 11, 2021

  1. Copy the full SHA
    71d46aa View commit details
  2. Copy the full SHA
    b78be5d View commit details
  3. build: fix release action (#240)

    When we are not on the upstream repo, don't run the release action
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 11, 2021
    Copy the full SHA
    071a220 View commit details

Commits on Jan 22, 2021

  1. Copy the full SHA
    0433251 View commit details

Commits on Jan 23, 2021

  1. Copy the full SHA
    f14c912 View commit details

Commits on Jan 24, 2021

  1. 2
    Copy the full SHA
    026a012 View commit details
  2. Copy the full SHA
    cb1d808 View commit details
  3. Copy the full SHA
    1b35af6 View commit details
  4. build: remove npm link from docs-test too (#255)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    43cb074 View commit details
  5. build: CI: add caching for Windows too (#253)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    54cfe7d View commit details
  6. chore: README.md: remove .svg from badges (#252)

    It's the default
    
    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    a5ac753 View commit details
  7. build: CI: switch to Node.js 14 (#254)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    8c6b6cc View commit details
  8. Copy the full SHA
    9d98693 View commit details
  9. build: CI: specify FORCE_COLOR: 2 (#257)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Jan 24, 2021
    Copy the full SHA
    5d0cdcd View commit details

Commits on Jan 25, 2021

  1. Copy the full SHA
    4e17c8e View commit details
  2. Copy the full SHA
    187aea1 View commit details

Commits on Feb 5, 2021

  1. Copy the full SHA
    2fb77fb View commit details

Commits on Feb 7, 2021

  1. fix: use custom HTTP server (#265)

    This switches from `vercel/serve-handler` to a custom local HTTP static web server.
    JustinBeckwith authored Feb 7, 2021
    Copy the full SHA
    9b0b206 View commit details
  2. Copy the full SHA
    668aad6 View commit details
  3. Copy the full SHA
    39816c5 View commit details
  4. build: remove the cache fallback (#268)

    Co-authored-by: Justin Beckwith <beckwith@google.com>
    XhmikosR and JustinBeckwith authored Feb 7, 2021
    Copy the full SHA
    d02484d View commit details

Commits on Feb 8, 2021

  1. fix(deps): update dependency marked to v2 [security] (#271)

    Co-authored-by: Renovate Bot <bot@renovateapp.com>
    renovate[bot] and renovate-bot authored Feb 8, 2021
    Copy the full SHA
    f9c13e9 View commit details

Commits on Feb 12, 2021

  1. chore(deps): update dependency @types/cheerio to v0.22.24 (#273)

    Co-authored-by: Renovate Bot <bot@renovateapp.com>
    renovate[bot] and renovate-bot authored Feb 12, 2021
    Copy the full SHA
    ec49cdf View commit details

Commits on Feb 17, 2021

  1. Copy the full SHA
    245b3bd View commit details

Commits on Feb 21, 2021

  1. Copy the full SHA
    ea89b42 View commit details
9 changes: 9 additions & 0 deletions .c8rc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"exclude": [
"build/test"
],
"reporter": [
"html",
"text"
]
}
77 changes: 59 additions & 18 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
on:
push:
branches:
- master
- main
pull_request:
name: ci
env:
FORCE_COLOR: 2
NODE: 14
jobs:
test:
runs-on: ubuntu-latest
@@ -13,51 +16,89 @@ jobs:
node: [10, 12, 14, 15]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: ${{ matrix.node }}
- run: npm install
- name: Set up npm cache
uses: actions/cache@v2
with:
path: ~/.npm
key: ${{ runner.os }}-node-v${{ matrix.node }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-v${{ matrix.node }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
- run: npm ci
- run: npm test
- uses: codecov/codecov-action@v1
with:
name: actions ${{ matrix.node }}
if: matrix.node == env.NODE
windows:
runs-on: windows-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: ${{ env.NODE }}
- name: Get npm cache directory
id: npm-cache
run: |
echo "::set-output name=dir::$(npm config get cache)"
- name: Set up npm cache
uses: actions/cache@v2
with:
node-version: 12
- run: npm install
path: ${{ steps.npm-cache.outputs.dir }}
key: ${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
- run: npm ci
- run: npm test
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: 12
- run: npm install
node-version: ${{ env.NODE }}
- name: Set up npm cache
uses: actions/cache@v2
with:
path: ~/.npm
key: ${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
- run: npm ci
- run: npm run lint
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: ${{ env.NODE }}
- name: Set up npm cache
uses: actions/cache@v2
with:
node-version: 12
- run: npm install
path: ~/.npm
key: ${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
- run: npm ci
- run: npm run docs-test
release:
if: github.ref == 'refs/heads/master'
if: github.repository == 'JustinBeckwith/linkinator' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
needs: [test, lint]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
- uses: actions/setup-node@v2
with:
node-version: ${{ env.NODE }}
- name: Set up npm cache
uses: actions/cache@v2
with:
node-version: 12
- run: npm install
path: ~/.npm
key: ${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
${{ runner.os }}-node-v${{ env.NODE }}-${{ hashFiles('package.json') }}-${{ hashFiles('package-lock.json') }}
- run: npm ci
- run: npm run compile
- run: npm run build-binaries
- run: npx semantic-release
34 changes: 34 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: "CodeQL"

on:
push:
branches:
- main
- "!renovate/**"
pull_request:
# The branches below must be a subset of the branches above
branches:
- main
schedule:
- cron: "0 0 * * 0"

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: "javascript"

- name: Autobuild
uses: github/codeql-action/autobuild@v1

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
node_modules/
package-lock.json
.nyc_output
build/
coverage
.vscode
3 changes: 2 additions & 1 deletion .releaserc.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{
"assets": "build/binaries/*"
"assets": "build/binaries/*",
"branches": ["main"]
}
63 changes: 40 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# 🐿 linkinator

> A super simple site crawler and broken link checker.
[![npm version](https://img.shields.io/npm/v/linkinator.svg)](https://www.npmjs.org/package/linkinator)
[![Build Status](https://img.shields.io/github/workflow/status/JustinBeckwith/linkinator/ci/master)](https://github.com/JustinBeckwith/linkinator/actions?query=branch%3Amaster+workflow%3Aci)
[![codecov](https://img.shields.io/codecov/c/github/JustinBeckwith/linkinator/master)](https://codecov.io/gh/JustinBeckwith/linkinator)
[![npm version](https://img.shields.io/npm/v/linkinator)](https://www.npmjs.org/package/linkinator)
[![Build Status](https://img.shields.io/github/workflow/status/JustinBeckwith/linkinator/ci/main)](https://github.com/JustinBeckwith/linkinator/actions?query=branch%3Amain+workflow%3Aci)
[![codecov](https://img.shields.io/codecov/c/github/JustinBeckwith/linkinator/main)](https://codecov.io/gh/JustinBeckwith/linkinator)
[![Known Vulnerabilities](https://img.shields.io/snyk/vulnerabilities/github/JustinBeckwith/linkinator)](https://snyk.io/test/github/JustinBeckwith/linkinator)
[![Code Style: Google](https://img.shields.io/badge/code%20style-google-blueviolet.svg)](https://github.com/google/gts)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/semantic-release/semantic-release)
[![Code Style: Google](https://img.shields.io/badge/code%20style-google-blueviolet)](https://github.com/google/gts)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079)](https://github.com/semantic-release/semantic-release)


Behold my latest inator! The `linkinator` provides an API and CLI for crawling websites and validating links. It's got a ton of sweet features:

- 🔥 Easily perform scans on remote sites or local files
- 🔥 Scan any element that includes links, not just `<a href>`
- 🔥 Supports redirects, absolute links, relative links, all the things
@@ -19,7 +21,7 @@ Behold my latest inator! The `linkinator` provides an API and CLI for crawling w
## Installation

```sh
$ npm install linkinator
npm install linkinator
```

Not into the whole node.js or npm thing? You can also download a standalone binary that bundles node, linkinator, and anything else you need. See [releases](https://github.com/JustinBeckwith/linkinator/releases).
@@ -28,7 +30,7 @@ Not into the whole node.js or npm thing? You can also download a standalone bin

You can use this as a library, or as a CLI. Let's see the CLI!

```
```text
$ linkinator LOCATIONS [ --arguments ]
Positional arguments
@@ -88,52 +90,53 @@ $ linkinator LOCATIONS [ --arguments ]
You can run a shallow scan of a website for busted links:

```sh
$ npx linkinator http://jbeckwith.com
npx linkinator http://jbeckwith.com
```

That was fun. What about local files? The linkinator will stand up a static web server for yinz:

```sh
$ npx linkinator ./docs
npx linkinator ./docs
```

But that only gets the top level of links. Lets go deeper and do a full recursive scan!

```sh
$ npx linkinator ./docs --recurse
npx linkinator ./docs --recurse
```

Aw, snap. I didn't want that to check *those* links. Let's skip em:

```sh
$ npx linkinator ./docs --skip www.googleapis.com
npx linkinator ./docs --skip www.googleapis.com
```

The `--skip` parameter will accept any regex! You can do more complex matching, or even tell it to only scan links with a given domain:

```sh
$ linkinator http://jbeckwith.com --skip '^(?!http://jbeckwith.com)'
linkinator http://jbeckwith.com --skip '^(?!http://jbeckwith.com)'
```

Maybe you're going to pipe the output to another program. Use the `--format` option to get JSON or CSV!

```sh
$ linkinator ./docs --format CSV
linkinator ./docs --format CSV
```

Let's make sure the `README.md` in our repo doesn't have any busted links:

```sh
$ linkinator ./README.md --markdown
linkinator ./README.md --markdown
```

You know what, we better check all of the markdown files!

```sh
$ linkinator "**/*.md" --markdown
linkinator "**/*.md" --markdown
```

### Configuration file

You can pass options directly to the `linkinator` CLI, or you can define a config file. By default, `linkinator` will look for a `linkinator.config.json` file in the current working directory.

All options are optional. It should look like this:
@@ -154,10 +157,11 @@ All options are optional. It should look like this:
To load config settings outside the CWD, you can pass the `--config` flag to the `linkinator` CLI:

```sh
$ linkinator --config /some/path/your-config.json
linkinator --config /some/path/your-config.json
```

## GitHub Actions

You can use `linkinator` as a GitHub Action as well, using [JustinBeckwith/linkinator-action](https://github.com/JustinBeckwith/linkinator-action):

```yaml
@@ -181,8 +185,10 @@ To see all options or to learn more, visit [JustinBeckwith/linkinator-action](ht
## API Usage
#### linkinator.check(options)
### linkinator.check(options)
Asynchronous method that runs a site wide scan. Options come in the form of an object that includes:
- `path` (string|string[]) - A fully qualified path to the url to be scanned, or the path(s) to the directory on disk that contains files to be scanned. *required*.
- `concurrency` (number) - The number of connections to make simultaneously. Defaults to 100.
- `port` (number) - When the `path` is provided as a local path on disk, the `port` on which to start the temporary web server. Defaults to a random high range order port.
@@ -195,15 +201,19 @@ where the server is started. Defaults to the path passed in `path`.
- `linksToSkip` (array | function) - An array of regular expression strings that should be skipped, OR an async function that's called for each link with the link URL as its only argument. Return a Promise that resolves to `true` to skip the link or `false` to check it.
- `directoryListing` (boolean) - Automatically serve a static file listing page when serving a directory. Defaults to `false`.

#### linkinator.LinkChecker()
### linkinator.LinkChecker()

Constructor method that can be used to create a new `LinkChecker` instance. This is particularly useful if you want to receive events as the crawler crawls. Exposes the following events:

- `pagestart` (string) - Provides the url that the crawler has just started to scan.
- `link` (object) - Provides an object with
- `url` (string) - The url that was scanned
- `state` (string) - The result of the scan. Potential values include `BROKEN`, `OK`, or `SKIPPED`.
- `status` (number) - The HTTP status code of the request.

### Simple example
### Examples

#### Simple example

```js
const link = require('linkinator');
@@ -239,7 +249,7 @@ async function simple() {
simple();
```

### Complete example
#### Complete example

In most cases you're going to want to respond to events, as running the check command can kinda take a long time.

@@ -299,26 +309,33 @@ complex();
## Tips & Tricks

### Using a proxy

This library supports proxies via the `HTTP_PROXY` and `HTTPS_PROXY` environment variables. This [guide](https://www.golinuxcloud.com/set-up-proxy-http-proxy-environment-variable/) provides a nice overview of how to format and set these variables.

### Globbing

You may have noticed in the example, when using a glob the pattern is encapsulated in quotes:

```sh
$ linkinator "**/*.md" --markdown
linkinator "**/*.md" --markdown
```

Without the quotes, some shells will attempt to expand the glob paths on their own. Various shells (bash, zsh) have different, somewhat unpredictable behaviors when left to their own devices. Using the quotes ensures consistent, predictable behavior by letting the library expand the pattern.

### Debugging

Oftentimes when a link fails, it's an easy to spot typo, or a clear 404. Other times ... you may need more details on exactly what went wrong. To see a full call stack for the HTTP request failure, use `--verbosity DEBUG`:

```sh
$ linkinator https://jbeckwith.com --verbosity DEBUG
linkinator https://jbeckwith.com --verbosity DEBUG
```

### Controlling Output

The `--verbosity` flag offers preset options for controlling the output, but you may want more control. Using [`jq`](https://stedolan.github.io/jq/) and `--format JSON` - you can do just that!

```sh
$ linkinator https://jbeckwith.com --verbosity DEBUG --format JSON | jq '.links | .[] | select(.state | contains("BROKEN"))'
linkinator https://jbeckwith.com --verbosity DEBUG --format JSON | jq '.links | .[] | select(.state | contains("BROKEN"))'
```

## License
Loading