Skip to main content

The npm faker package and the unexpected demise of open source libraries

著者:
wordpress-sync/blog-feature-snyk-open-source-blue

2022年9月2日

0 分で読めます

Where do open source dependencies go to die, and why do they come to an end? What happened to the npm faker module? Can it happen again? Join me to learn how open source software libraries rise to glory and how they reach their end of life. I’ll also include some takeaways for developers and ops engineers.

I’d like to share several stories from the JavaScript and PHP ecosystem that are about abrupt disruption of the software supply chain due to climate and environmental concerns, as well as legal disputes around politics and humanitarian acts.

The main theme of our stories is open source protestware movements that emerged due to various issues around open source sustainability, climate change, and the freedom of open source software.

Another interesting story about protestware on npm focuses on a maintainer of a node-ipc npm package taking action due to a political crisis.

The left-pad JavaScript library and npm maintainer Azer Koçulu

This is a story of how legal actions taken by a commercial body towards one open source developer resulted in breaking thousands of projects in the JavaScript ecosystem, among them Node.js and Babel. €‹ €‹

Azer Koçulu, maintainer of the left-pad npm package, had another npm module he maintained called Kik. Apparently the Kik module attracted the attention of some corporate lawyers because of an instant-messaging application of the same name. Despite communication from lawyers and npm’s requests, Azer refused to rename the package.

Upon the threat of a lawsuit (and due to the npm organization not being able to establish agreement between both sides), Azer unpublished more than 250 of his npm packages from the npm registry as a sign of protest.

Unfortunately, though, one of those npm packages was left-pad. What is left-pad? Left-pad is a tiny library, 17 lines of code for padding strings with spaces if we refer to left-pad@1.0.0. But even left-padding a string requires some edge case handling. Its latest version, 1.3.0, was published on April 9th, 2018 as follows:

1/* This program is free software. It comes without any warranty, to
2     * the extent permitted by applicable law. You can redistribute it     * and/or modify it under the terms of the Do What The Fuck You Want
3     * To Public License, Version 2, as published by Sam Hocevar. See
4     * http://www.wtfpl.net/ for more details. */'use strict';
5module.exports = leftPad;
6
7var cache = [
8  '',
9  ' ',
10  '  ',
11  '   ',
12  '    ',
13  '     ',
14  '      ',
15  '       ',
16  '        ',
17  '         '
18];
19
20function leftPad (str, len, ch) {
21  // convert `str` to a `string`
22  str = str + '';
23  // `len` is the `pad`'s length now
24  len = len - str.length;
25  // doesn't need to pad
26  if (len <= 0) return str;
27  // `ch` defaults to `' '`
28  if (!ch && ch !== 0) ch = ' ';
29  // convert `ch` to a `string` cuz it could be a number
30  ch = ch + '';
31  // cache common use cases
32  if (ch === ' ' && len < 10) return cache[len] + str;
33  // `pad` starts with an empty string
34  var pad = '';
35  // loop
36  while (true) {
37    // add `ch` to `pad` if `len` is odd
38    if (len & 1) pad += ch;
39    // divide `len` by 2, ditch the remainder
40    len >>= 1;
41    // "double" the `ch` so this operation count grows logarithmically on `len`
42    // each time `ch` is "doubled", the `len` would need to be "doubled" too
43    // similar to finding a value in binary search tree, hence O(log(n))
44    if (len) ch += ch;
45    // `len` is 0, exit the loop
46    else break;
47  }
48  // pad `str`!
49  return pad + str;
50}

Left-pad received more than 2.5 million downloads a month, making it very popular — and the yanking of left-pad caused other projects that depend on it to fail: Node.js, Babel and others (more here: http://left-pad.io/).

The fallout of yanked dependencies from the npm community was a precedent in the npm registry. No one thought this was something that someone would do.

This became known as the story of how one developer just broke the JavaScript ecosystem with 17 lines of code.

How to left pad a string in JavaScript?

The String prototype includes padStart() and padEnd() methods, which take two arguments: the length in characters to pad and the string to pad with. The returned value is the new padded string. This API is available from Node.js 8 and supported by all modern browsers at this time.

The following is a code example of left-padding a string in JavaScript:

1const someString = ‘h’;
2console.log(someString.padStart(5, 'h'));
3
4// output will be:
5####

Many would attribute the addition of this new ECMAscript Specification to the left-pad npm package for creating the conversation around a richer language API.

The Faker library for PHP, Laravel, and other projects to generate fake data

Would you be able to part from a successful project if it meant you'd have a better environmental footprint? This next story is about a maintainer who was brave enough to put his success on the sidelines, as long as it meant a greener Earth.

Meet François. In 2011, François Zaninotto released Faker, a PHP library for creating fakes, useful for database seeding and testing purposes.

Faker grew over time and became very popular with SDKs in other languages (such as Python) and included support for more than 70 locales. But Faker is a monolithic library — even if you just need a partial dataset to fake usernames in your tests, you still need to download the entire dataset — which is worth 3MB of data.

However, most people just use 1 locale. What was the size of the library? The Faker library was about 3MB in size. For a dev tooling library it might not be small, but not very big either (think: chrome for e2e testing, etc). Except that by 2019... Faker was so popular that it had been downloaded over 121 million times across its lifetime.

Let me ask you this: how often have you considered the size of a library as a concern (aside from obvious frontend performance optimizations)? Over the years, Faker has probably emitted more than 11 metric tons of CO2 equivalent.

François didn't want that on his conscience, so the project is now discontinued. But François could've handed it off to someone else, right? François had a bad experience from a past library he created: Properl, which was an ORM PHP library. It didn't work well for him.

That sums it up for the Faker PHP library. The project’s repository on GitHub was archived. No new releases. No new pull requests. That said, as with open source, forking a project’s source code is a thing. Developers offered to move Faker to a new organization with new maintainers, but that would mean losing the project’s open source reputation (for example, by deleting its GitHub Stars count). So that didn't work out.

François wrote an end note to new maintainers and developers: "If you're interested, you can start by forking Faker. It will be much better than Faker. Just be aware that you'll probably regret it in a few years ;)".

The npm Faker library

This Faker.js library also helps with generating a massive amount of random data that you can use in your tests, but its story differs vastly from the PHP one above.

“No more free work,” said Marak Squires in a GitHub issue for one of his repositories back in November 2020.

Maintaining open source software is a great burden. The work spans from coding, to regular project maintenance (including documentation), to publishing releases, updating changelog files, creating a welcoming environment for new developers and contributors who wish to join the project, answering questions, researching reported issues, and reviewing pull requests. These are just some of the tasks that are required to maintain a healthy open source project.

Surely, many of us developers, as well as for-profit companies, are using a lot of open source software. But are we giving anything back?

Marak is one open source developer, and a maintainer of some open source projects. They maintain a popular open source npm package called faker.js, which at one point had received more than twelve million downloads a month. Don’t confuse this with the previous story about the Faker PHP library, though. These are different projects, different maintainers, different stories.

Marak’s work was helping developers of Fortune500 companies with their work and ultimately helping the businesses grow. But like many maintainers, he didn’t get much out of it. The  work of maintainers is largely unfunded and unappreciated.

In November 2020, Marak decided to make a statement about open source sustainability, or the lack thereof. The maintainer opened an issue in his GitHub repository stating that they will not be doing any more free work.

wordpress-sync/blog-npmFaker-nomorefreework-1

Judging by the emoji reactions to Marak’s statement on this GitHub issue, it was largely accepted and recognized by the community. Marak was heard, but his actions didn’t leave a notable lasting impression.

On January 5th, 2022, the maintainer published a new version of the Faker.js npm package that includes a semantic version number of 6.6.6, such that faker@6.6.6 is a valid version to be downloaded from the npm registry.

The new 6.6.6 version completely removed all traces of the original source code of the Faker.js npm package. Any new installs of this new major version would result in broken functionality.

To accompany the new faker@6.6.6 published version of this statement, they also force pushed a commit to the Faker.js GitHub repository that removed the entire source code:

wordpress-sync/blog-npmFaker-MarakGithub
wordpress-sync/blog-npmFaker-Marakcommit

How much impact would a broken npm package have on its consumers? Well, to assess that, you have to understand how npm package versioning works and how the npm package manager usually works.

In short: if you’re already using faker in your projects, then whether you use a lockfile to pin your dependencies or not, the npm package manager will fetch the latest version either based on the minor or patch version resolutions. However, that would still keep you in the “safe zone” of the current major version of Faker.js you’re on. This is because faker@5.5.3 is the latest safe major version, and the broken faker@6.6.6 major version requires an explicit upgrade that is often not made automatically available to an end user.

So maybe this maintainer was making a statement, but no one actually felt the pain? The story continues: Marak added a Denial of Service vulnerability built-in to the colors npm package, impacting about a hundred million monthly downloads.

Takeaways for responsible usage of open source libraries

Following are four practical recommendations on managing open source software to avoid the security impact of protestware npm packages and unmaintained software.

1. Efficiently manage your open source library infrastructure

What can we do as developers who depend on the availability of open source libraries? Re-installing packages from registries is inefficient and slow, and may introduce non-deterministic results.

To better handle scenarios in which packages and open source libraries may become obsolete and removed from open registries such as npm, PyPI, and others, we recommend using local private and caching proxies, like the open source Verdaccio project. We’ve interviewed Verdaccio maintainer Juan Picado . Read the interview to can learn more about managing your open source libraries.

2. Vet open source libraries

When adopting new open source packages, or considering alternatives to existing ones, spend time evaluating the community around the project, its past maintenance history and current activities, and its security posture. Other aspects are important as well, such as the quality of the project, and whether it has tests and available documentation.

One tool to help developers assess the quality of npm packages is Snyk Advisor, a free helpful resource to quantify aspects of an open source package. For example, Snyk Advisor provides a health score for the nodemon npm package.

wordpress-sync/blog-npmFaker-nodemonpackage

3. Use a lockfile to pin dependencies

Using a lockfile to pin down dependencies to known and tested versions is a great way to achieve deterministic installs. This is true regardless of the JavaScript ecosystem, and is a known technique to ensure no surprises in CI and otherwise.

If you’re new to using lockfiles andbut you’re unsure how they works, I recommend reading What is a package-lock.json to understand the semantics and mechanics of dependency management.

wordpress-sync/blog-npmFaker-lockfile

4. Pin transitive dependencies

In certain situations, you may not be able to control the version of transitive dependencies added to your node_modules tree. This happens because those dependencies are part of other dependencies declared by other maintainers.

That said, the npm package managers and others, such as yarn or pnpm, include a way to override a nested dependency version by declaring the acceptable version range in your package.json file.

Here’s an example that was critically important due to the fallout after the peacenotwar module sabotage and node-ipc npm package protestware:

1"overrides": {
2    "node-ipc@>9.2.1 <10": "9.2.1",
3    "node-ipc@>10.1.0": "10.1.0
4  }

Closing words

Open source software security and activism are topics dear to my heart, and this blog post is based on a talk I gave at Cloud Native SecurityCon in Valencia on May 2022. If you’re interested, you can see the recording here:

I also highly recommend taking a proactive approach to open source dependency security by connecting your Git repositories to Snyk to receive automated security fixes and dependency version upgrades.