77% of 433,000 sites use vulnerable JavaScript libraries
2017年11月21日
0 分で読めますLast week, we released our first annual State of Open Source Security report. One of the discoveries the report mentions is that an analysis of around 433,000 sites found that 77% of them use at least one front-end JavaScript library with a known security vulnerability. This number mirrors the one we reported back in March, but thanks to Google Chrome’s Lighthouse now testing for vulnerable JavaScript libraries using Snyk, we can get much more thorough results.
Lighthouse data is collected as part of HTTP Archive, and the data is available for querying through BigQuery. As a result, we get to query Lighthouse audit data on a very large scale.
Looking at how many sites are vulnerable
The October 15th data (the most recent run available) on BigQuery contains data collected from 439,176 different urls. After you account for urls where Lighthouse was unable to run, or the audit itself didn’t complete for whatever reason, we get a dataset of 418,112 different sites to query against.
The first question is how many of those sites carry known vulnerabilities. We can run a query against the reports to get that information:
1SELECT
2 JSON_EXTRACT_SCALAR(report, "$.audits.no-vulnerable-libraries.score") AS score,
3 COUNT(0) AS volume
4FROM
5 [httparchive:har.latest_lighthouse_mobile]
6WHERE
7 report IS NOT NULL
8GROUP BY
9 score
10HAVING
11 score IS NOT NULL
12ORDER BY
13 score
The results are very much in line with our smaller scale study back in March: 77.3% (323,132) of those sites failed the audit. In other words, 77.3% of those sites contain at least one client-side JavaScript library with a known security vulnerability. The new version of the HTTP Archive site will report on how this changes over time.
We can drill-down even more to see how many known vulnerabilities are being carried by those libraries:
1SELECT
2 REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(report, "$.audits.no-vulnerable-libraries.displayValue"), r'^\S*') AS knownVulnerabilities,
3 COUNT(0) AS volume
4FROM
5 `httparchive.lighthouse.2017_10_15_mobile`
6WHERE
7 report IS NOT NULL
8AND
9 JSON_EXTRACT_SCALAR(report, "$.audits.no-vulnerable-libraries.score") = 'false'
10GROUP BY
11 knownVulnerabilities
12ORDER BY
13 CAST(knownVulnerabilities as int64)
It turns out, that if you carry at least one known vulnerability, you likely carry more. 51.8% of vulnerable sites carry more than one known security vulnerability. While the majority of those sites carry one or two, the long-tail is scary. 9.2% of sites carry libraries with a combined four or more known security vulnerabilities.
Which libraries are the most often found to be vulnerable
Using the Lighthouse audit data, we can also get an idea of which libraries are most commonly found to be vulnerable.
First, we can query to see which libraries are detected most often—whether they are vulnerable or not. The following query grabs the ten most commonly found libraries:
1CREATE TEMPORARY FUNCTION getLibs(items STRING)
2RETURNS ARRAY<STRING>
3LANGUAGE js AS """
4 try {
5 return items.match(/"name":"([^"]*)"/ig);
6 } catch (e) {
7 return [];
8 }
9""";
10
11SELECT library, COUNT(0) Volume
12FROM (
13 SELECT getLibs(JSON_EXTRACT(report, "$.audits.no-vulnerable-libraries.extendedInfo.jsLibs")) AS libs
14 FROM `httparchive.lighthouse.2017_10_15_mobile`
15)
16CROSS JOIN
17 UNNEST(libs) AS library
18GROUP BY library
19ORDER BY Volume DESC
20LIMIT 10
Library | Number of times detected | Adoption % |
---|---|---|
jQuery | 344,643 | 82.4% |
jQuery UI | 83,075 | 19.9% |
Modernizr | 63,122 | 15.1% |
Bootstrap | 57,154 | 13.7% |
yepnope | 41,537 | 9.9% |
FlexSlider | 33,002 | 7.9% |
Underscore | 17,633 | 4.2% |
Google Maps | 14,312 | 3.4% |
Moment.js | 14,038 | 3.4% |
SWFObject | 13,521 | 3.2% |
Unsurprisingly, jQuery tops the list. This is right in line with what we saw back in March, and what you would probably expect. No library yet has come close to reaching jQuery’s universal appeal. One caveat here: React is currently being underreported. Once the updated detection script has been pulled into Lighthouse, its numbers will increase (and the overall percentage of vulnerable sites will likely increase slightly as well).
Now, let’s change it up and look at which libraries are found to be carrying known vulnerabilities.
1CREATE TEMPORARY FUNCTION getLibs(items STRING)
2RETURNS ARRAY<STRING>
3LANGUAGE js AS """
4 try {
5 return items.match(/"name":"([^"]*)"/ig);
6 } catch (e) {
7 return [];
8 }
9""";
10
11SELECT library, COUNT(0) Volume
12FROM (
13 SELECT getLibs(JSON_EXTRACT(report, "$.audits.no-vulnerable-libraries.extendedInfo.vulnerabilities")) AS libs
14 FROM `httparchive.lighthouse.2017_10_15_mobile`
15)
16CROSS JOIN
17 UNNEST(libs) AS library
18GROUP BY library
19ORDER BY Volume DESC
20LIMIT 10
The top couple of names on the list are very similar.
Library | Number of times found vulnerable | % of all instances of this lib detected |
---|---|---|
jQuery | 318,786 | 92.5% |
jQuery UI | 74,486 | 89.7% |
Moment.js | 10,245 | 73.0% |
AngularJS | 7,609 | 84.8% |
Handlebars | 3,129 | 60.7% |
Mustache | 1,925 | 51.0% |
YUI 3 | 559 | 40.3% |
jQuery Mobile | 413 | 3.7% |
Knockout | 407 | 19.6% |
React | 181 | 10.2% |
Looking at the percentages doesn’t paint a rosy picture. 92.5% of jQuery versions, the most popular library on the web by far, in production carry a known security vulnerability. In fact, of the ten libraries most commonly found to be carrying a known vulnerability, six of them are vulnerable in the majority of versions found in production.
This is the case despite the fact that every one of the libraries on this list has versions available that do not carry these vulnerabilities.
Library | Oldest Version with No Known Vulnerabilities | Release Date |
---|---|---|
jQuery | 3.0.0 | June, 2016 |
jQuery UI | 1.10.0 | January, 2013 |
Moment.js | 2.15.2 | October, 2016 |
AngularJS | 1.6.1 | December, 2016 |
Handlebars | 4.0.0 | September, 2015 |
Mustache | 2.2.1 | December, 2015 |
YUI 3 | 3.10.3 | June, 2016 |
jQuery Mobile | 1.2.0 | October, 2012 |
Knockout | 3.0.0 | October, 2013 |
React | 0.14.0 | October, 2015 |
Each of the front-end libraries most commonly found to be vulnerable has been free of known vulnerabilities for anywhere from one to five years. The reality is that front-end libraries and frameworks often don’t get updated after they hit production.
Reason for hope
The picture is a bit grim right now—there’s no way to deny it. While this data doesn’t mean that all 77% of these sites are exploitable (it’s possible they could be avoiding the vulnerable methods), that’s a small consolation. That’s 77% of sites that are one developer making one method call away from being vulnerable. As we’ve seen in 2017, open-source vulnerabilities need to be taken very seriously.
But there’s also a bright side. While there are a large number of vulnerabilities in production, those vulnerabilities have been addressed in the libraries themselves. Each of the major libraries has versions available that are free of known security vulnerabilities—we just need to get them into production.
To get to a better situation, we need a few things to happen. The first is improved tooling and tooling adoption. According to our State of Open Source Security survey, 38% of people using open-source don’t use any sort of automated tools to help keep their packages up to date. I am willing to wager that if you were to look specifically at front-end JavaScript usage, you would see even lower adoption.
That number should improve. Improvements to npm and Yarn have made front-end package management much simpler for developers. Pairing a solid package management workflow with tools—like Snyk—that will help you to find, prevent, fix and monitor those packages for dependencies will go a long way towards making the web more secure.
The second thing we need is for an increase in the general awareness and understanding of the problem. It’s why we published the State of Open Source Security report—to shed light on the challenges faced in securing open source and help find ways we can improve.
Having the vulnerable libraries audit in Lighthouse (and Sonar) also helps. These tools make it much easier for developers to spot issues on the sites they build. And thanks to the HTTP Archive and BigQuery, we have easy to access data to help us see how the problem scales.
While the data right now isn’t encouraging, improved awareness and improved tooling make this a solvable problem for the future.