How much do we really know about how packages behave on the npm registry?
April 22, 2019
0 mins readIn the State of Open Source Security Report 2019 we shared the details of language-based package repository growth over the last few years. As we showed, npm comes out on top every year by a landslide.
At the time this article was written, npm boasted over 960,000 packages and recorded the addition of more than 250,000 packages just in the year 2018.
With that in mind, have you asked yourself how much you know about package behavior in the npm ecosystem? How many packages are connected to each other?
Let’s set out to explore today’s biggest open source package registry!
In their recently published research work, K. Vaidya, Ruturaj & De Carli, Lorenzo & Davidson, Drew & Rastogi, Vaibhav. (2019). Security Issues in Language-based Sofware [sic] Ecosystems, the authors study security attacks on open source language repositories such as npm and PyPI, focusing mainly on malicious packages and how they are impacted based on the unique characteristics of each language ecosystem.
Finding the study very interesting, I was curious to turn some of the study data into questions that I can ask in a public medium such as Twitter to gauge the perception JavaScript developers have of the npm ecosystem with regards to package health and connections with other packages.
Misconceptions about packages in the npm registry
From this research paper, I extracted data points that correlate with the following questions to post as a general survey:
What is the percent of packages on npm that have no dependencies and no dependents?
What is the average depth of a package dependency chain on npm?
How many packages on npm could be considered *abandoned? (* based on a metric such as lack of package releases in the last 12 months)
Having selected these intriguing questions, I set out to post the polls on Twitter and gather responses.
Orphan packages on npm
First question: What is the percent of packages on npm with no dependencies and no dependents?
959,567? in npm
What is the percent of packages on #npm with no dependencies and no dependents?#javascript #nodejs #npmjsPlease RT!— Liran Tal (@liran_tal) April 15, 2019
The correct answer is: 28% of packages on npm have no dependencies and no dependents.
As can be seen, 66% of people believe that the npm registry is a very convoluted web of package connections. According to the research paper however, only 28% of all packages on the npm registry have no dependencies or dependents—which only 7% of respondents chose.
For the PyPI repository, this number increases to 36%.
Dependency chain depth on npm
Second question: What is the average depth of a package dependency chain on npm?
#npmjs features 961,600
Average depth of a package dependency chain on npm is:#javascript #nodejs #npmjsPlease RT!— Liran Tal (@liran_tal) April 17, 2019
The correct answer is: on average the depth of package dependency chain on npm is 4.39 packages deep.
The report finds that the dependency tree depth, being the length of the longest dependency chain, is on average more than four packages deep. This is in comparison to PyPI, where this number is only 1.7.
One assumption we can make from this is that JavaScript developers favor creating smaller reusable units of code, and indeed reuse them across projects.
Abandoned packages on npm
Third question: How many packages on npm could be considered abandoned?
One last and interesting poll!
How many packages on #npm could be considered abandoned[1] ?[1] based on a metric such as a lack of a release in the last 12 monthsPlease RT. #javascript #nodejs #npmjs #opensource— Liran Tal (@liran_tal) April 18, 2019
The correct answer is: 61% of packages on npm did not publish a release in the last 12 months.
Continuing from prior research, this study defined abandoned packages based on a release metric. In other words, a lack of a package releases in the last 12 months by a maintainer was considered abandoned.
Distinguishing between unmaintained and feature-complete packages which simply reached a maturity stage that requires no further releases is not an easy task to do. Even though this metric may be questioned by some, the actual numbers are staggering.
The report found about 496,000 packages on npm (out of ~801,000 at the time of the study) which did not have any releases in the last 12 months. That’s 61% of the packages on npm with no new releases at all.
On the PyPI repository, the number is similar—57% of all packages did not have a release in the last 12 months.
Are abandoned packages downloaded less frequently? Not really.
As the report goes on to show, cumulative download counts approach billions in both ecosystems.
The following graph lists 20 of the most downloaded packages, which we considered as abandoned based on the report. These packages, such as wordwrap
, is-object
, account for hundreds of millions of downloads a year.
Summary
With the growth of open source software we can expect more studies on public language-based package repositories and their security.
The study concludes with the following:
Recommending improvements for package repositories and package managers. For example, in the context of typosquatting attacks, it’s recommended that users be alerted when they install wrong packages which they did not originally intend to install.
The nature of these ecosystems suggests that they are "ripe for exploitation, and the number of incidents will only increase in the future" to quote the report
Use open source. Stay secure.
Get started in capture the flag
Learn how to solve capture the flag challenges by watching our virtual 101 workshop on demand.