The 5 dimensions of an npm dependency
2016年6月16日
0 分で読めますWe often talk about the growing number of npm dependencies, and how they make us productive and fast on one hand, but fragile and potentially insecure on the other. But what exactly is an npm dependency?
At Snyk, our product focuses on securing dependencies, so we had to define what exactly a dependency is in the first place. This post covers the different dimensions of a dependency, sharing our learnings trying to define an easy taxonomy and help you wrap your head around how they can be grouped.
Basic definition: code you depend on
At the most minimal definition, a dependency is just that - a package of code your application depends on. Without this code, your application will not work correctly, and perhaps not even build.
Because of that, no matter how you organize them, every dependency affects your application’s functionality, reliability and security in some way. With that in mind, let’s see how we can slice and dice them.
Dimension 1: Dev vs Prod
The first and most explicit dependency type is dev vs prod. In the package.json file, production dependencies are explicitly listed as dependencies
, while dependencies used only during development are called devDependencies
.
The npm install
command, by default, will install both dev and prod dependencies for the current app, but will only install production dependencies of any package downloaded from npm.
For example, the npm util
package uses the following dependency sections of its package.json file:
{
"name": "util",
...
"dependencies": {
"inherits": "2.0.1"
},
...
"devDependencies": {
"zuul": "~1.0.9"
},
...
}
If you ran npm install util
, only the inherits
(production) dependency will be installed with it. However, if you cloned its repository, node-util
and ran npm install
in the cloned folder, both inherits
and zuul
will be installed.
As stated, this separation is made explicitly by an application, and has pretty straightforward logic. If the dependency is needed for the application to run, it should be a production dependency. If it’s only needed for test or build, it should be a dev one. When searching for vulnerabilities, dev dependencies matter little, and so at Snyk we only test production dependencies by default (though you can change that using the --dev
flag).
Note that peerDependencies
and optionalDependencies
are also production dependencies, with a certain twist as to when and how they’re installed. We’ll touch on those when discussing the Logical vs Disk dimension.
The image above shows a sample ratio of dev & prod dependencies, by bitHound.
Dimension 2: Direct vs Indirect
Some of your dependencies are Direct (also called Primary), explicitly requested in your package.json file. The majority of dependencies, however, are Indirect (a.k.a. Secondary), pulled in by a Direct dependency (or another Indirect one) to complete its task. For most applications, Indirect dependencies make up the vast majority of the overall list. For instance, very few applications use left-pad
directly, but a few high profile ones (like babel
and node
) do. This made left-pad
an Indirect dependency of a vast number of applications, and so made its unpublishing so impactful.
When a package is very far removed from your app, nested deep in the dependency tree, it’s easy to not be aware of it or forget about it altogether. However, even remote dependencies are still dependencies. The removal of left-pad
broke applications, including those that weren’t even aware they’re using it; not adhering to a deep dependency’s license can still cause legal problems; and a vulnerability in a distant indirect dependency can still let an attacker in.
Dimension 3: Package vs Version
Say you’re using the request
package at version 2.11.3
. What is your dependency then? Is it request
, the package, or is it request@2.11.3
, the specific version?
The full answer is both.Quite clearly, you’re dependent on request@2.11.3
. This version represents an immutable program, which you pulled in and are using in your code. Any flaws in this package, like this Remote Memory Exposure vulnerability will impact your code, you are on the hook to adhere to the MIT license it was made available under, etc.
However, you’re also dependent on the request
package as a project. If you’re using it via a semver range, you’re relying on its authors to not release a breaking change in a minor version fix. From a security perspective, you’re relying on them not to leak their npm or GitHub credentials, to test for security issues ahead of time, and to fix disclosed vulnerabilities quickly. Over time, you’re also relying on the project and its authors to maintain it well, fixing bugs and adding features in a timely manner. You can reduce this risk by using shrinkwrap
to freeze the package versions you use, or by bundling dependencies in, but in both cases you’ll stop getting new features and bug fixes.
While both package and version are your dependencies, you’re dependent on them in very different ways, and so it’ll be good to give each a different name. Unfortunately, there is no explicit name for a package+version combo, and the term package is used interchangeably for either one.
At Snyk, when we say dependency we typically refer to a package+version combo, e.g. request@2.11.3
. When we want to refer to the package, we explicitly say a dependent package. That said, taxonomy on this one is tricky, so while we try to stick to these guidelines, we sometimes just say “package” and let the reader decide what we mean based on context…
Packages like request
have many versions. You depend on the quality of each, and on the project, to manage them well.
Dimension 4: Logical vs Disk
Everything we discussed so far referred to logical dependencies - the way your dependency tree looks like in concept. However, the tree can change substantially when it actually gets downloaded to disk. This is partly due to peer and optional dependencies, which have a somewhat opportunistic install, but it’s even more impacted by npm3’s deduplication.
Let’s look at the inflight
package. Here’s the logical dependency tree for it:
inflight@1.0.5
├─┬ once@1.3.3
│ └── wrappy@1.0.2
└── wrappy@1.0.2
The wrappy@1.0.2
dependency is used both as a direct dependency and as an indirect one via once@1.3.3
. If we clone the inflight repo and run npm install --production
and npm ls
, we’ll get this:
inflight@1.0.5
├── once@1.3.3
└── wrappy@1.0.2
As you can see, wrappy@1.0.2
only shows up once. This is the work of npm3’s deduplication, which identifies the repeated package and avoids creating another copy on disk. Some minimal deduplication runs by default with npm2 as well, and it can be invoked explicitly by running npm dedupe
.
Our example keeps things simple, but things can get hairy and less predictable when you throw in semver ranges and repeated npm installs. You can see your logical vs disk dependencies by using snyk-resolve
, which Remy wrote about on this blog.
For this dimension, the key takeaway is to keep in mind that your Logical and Disk dependencies may defer, and that Disk dependencies depend on the installation logic and order. Make sure to review what was actually installed for the application as a whole, and not just the logic of each separate direct dependency.
Dimension 5: Dependency Path vs Unique Dependency
Now that we’ve defined our dependencies, the last dimension deals with counting them. Consider the following logical dependency tree:
app@1.2.3
├─┬ A@1.0.0
│ └── B@1.0.0
├─┬ C@1.0.0
│ └── B@2.0.0
└── B@2.0.0
Looking this tree, we can say the app has 3 dependent packages - A
, B
and C
. We can also say it has 4 disk dependencies (including version) - A@1.0.0
, B@1.0.0
, B@2.0.0
and C@1.0.0
- as deduplication would avoid redundancies. But how many logical dependencies does it have? Does it have 4 dependencies, one for each package+version combo, or 5 dependencies, one for each dependency tree node?
To separate the two, it helps to say this app has 4 unique dependencies, and it has 5 dependency paths. At Snyk, if, for instance, B@2.0.0
has a known vulnerability, we would say it has one known vulnerability, but two vulnerable paths.
Summary
To summarize, there are multiple dimensions to your set of dependencies, and each one is better suited for different purposes. When discussing dependencies, we should try to maintain the same taxonomy whenever possible, to keep conversations smooth.
As a cheat sheet, here are the 5 dimensions again:
Dev vs Prod: Your app needs Dev dependencies to build and test, and Prod dependencies to run.
Direct vs Indirect: Your app only explicitly requires Direct dependencies, but your quality, legal and security reviews should cover the (larger number of) Indirect dependencies as well.
Package vs Version: Your deployed app is impacted by the specific Version of each dependency, but your project relies on each dependent Package to keep working.
Logical vs Disk: The Logical tree of dependencies in your app can change substantially when installed on Disk, be sure to versions that were actually installed.
Path vs Unique: When counting your dependencies, be sure to separate the number of unique dependencies from dependency paths, to properly estimate the size of a task or problem.
Note: you're welcome to consult a more in-depth details and insights article about npm and yarn package manifests and how lock files work for applications and libraries.
Now that you know the lingo, you can use Snyk to test your application, and find out how many vulnerable production dependencies and dependency paths it may be using. In addition, you can search for your dependent packages on our Vulnerability DB to see if they have a history of security flaws.