February 27, 20230 mins read
Node.js presents a single-threaded event loop to your application, which allows CPU-bound operations to block the main thread and create delays. The worker_threads module addresses this problem by providing a mechanism for running code in parallel using a form of threading.
In a previous post, you learned what worker threads are, their common use cases, and how to add them to your project. In this article, we'll look at the pitfalls of worker threads and how they differ from the multithreading implementations in other programming languages. We'll also tour five prominent libraries that make the
worker_threads module easier to use.
Worker thread pitfalls and gotchas
The benefits of worker threads can be easily summarized: They're the only way to get something similar to multithreading when you're programming with Node.js. CPU-intensive operations, background processing, and any parallel code execution other than async I/O will need to be implemented using worker threads.
The module and the concept it implements come with several caveats though. You need to be aware of these before you start implementing your workers, as some situations shouldn't be parallelized with this mechanism.
Worker threads aren't true threads
The first and most prominent restriction of worker threads is that they aren't real threads in the conventional sense. Truly multithreaded applications allow the concurrent execution of multiple threads that share the same state. In this scenario, memory updated in one thread will be visible to the others, and implementing multithreaded code requires careful memory management to prevent race conditions.
Since the file is executed as a child process, there's no implicit memory sharing between the main program and the worker "thread." Instead, an event-based messaging system is provided so values can be exchanged between the processes.
This code, which you looked at in detail in part one, creates a worker process that receives a value (
hello) from the main thread and sends it back in a different form (
You said "hello".). The
postMessage() function is used to send data to the opposite end of the main-worker thread divide. Variables set on one side aren't visible to the other.
There is an exception to this rule: you can use a
SharedArrayBuffer to directly share memory between the threads by specifically allocating it as shared memory:
Saving this code to
shared.js and executing the file emits the following output:
1$ node shared.js 2Int32Array(4) [ 1001, 0, 0, 0 ]
This works because the worker thread can access the shared memory region created by the main thread. When the main thread later inspects the data, it sees the changes written by the worker. This still isn't true state-sharing, because you have to manually make the array available to the worker using its
workerData constructor option.
Spawning too many worker threads is expensive
Although workers start up quickly, there's always an associated overhead. It's a relatively expensive operation that renders worker threads unsuitable for lightweight operations. They're best reserved for parallel processing CPU-bound activities where the performance savings will easily outweigh the process spawn cost.
You can mitigate the inefficiencies by reusing a pool of worker threads, which allows you to avoid repeatedly incurring the expense of creating new ones. Libraries such as Piscina and Poolifier abstract away the complexity of managing a worker pool.
Using worker threads for I/O is wasteful
The nature of worker threads means they're unsuitable for I/O tasks. You don't need worker threads to read a file or fetch data over the network — better async alternatives are already built into Node.js.
The worker_threads documentation specifically advises against using the module for these situations. The expense of creating and maintaining the worker's process with its own V8 engine is much less efficient than Node's async I/O implementations. You'll end up harming performance, wasting resources, and writing redundant code if you implement these tasks as worker threads.
Debugging worker threads can be challenging
Pooled worker threads can be challenging to debug because there's not always a clear link between an event, the worker it's handled by, and the effect that's created. Trying to debug what's happening using
console.log() statements is tedious and error-prone.
You can produce more useful diagnostic information by attaching an AsyncResource to your pool. This provides full async stack traces that track what's happening inside the pool, allowing you to see the full sequence of activities that lead up to an effect occurring.
Sharing memory using a
SharedArrayBuffer also creates opportunities for problems to arise. You must use atomics or implement your own concurrency management system to prevent race conditions when accessing and modifying the shared memory. If race conditions do occur, they can cause strange symptoms in your application, and are often hard to identify, especially when they relate to memory that's used in many different places.
Top Node.js threading libraries
worker_threads module focuses on the basics of creating worker threads and exchanging data with them. Here are five popular libraries that wrap the module to provide a more convenient interface or higher-level features, such as thread pooling.
Piscina makes it easier to work with pools of workers. You can create your own task queues, track their completion, and cancel a task executing on a worker if it turns out to be redundant.
Here's a simple Piscina example. Save this code to
Now add this code to
Install the Piscina package with the following command:
1$ npm install piscina
When you run
node main.js, you'll see
You said hello appear in your terminal. Piscina provides a more convenient interface around the worker threads API.
Bree is a job scheduler for Node.js. It lets you execute async tasks at a specified interval. You can configure each task with concurrency limits, retry support, and cancellation. Bree uses worker threads internally to run task code outside the main loop.
Install Bree using npm:
1$ npm install bree
Now create a file called
bree-main.js with the following code:
Add the following code to
node bree-main.js will emit the time immediately and then every five seconds:
1Worker for job "bree-job" online 2The time is 11:45:30 3Worker for job "bree-job" exited with code 0 4Worker for job "bree-job" online 5The time is 11:45:35 6Worker for job "bree-job" exited with code 0
Poolifier is another worker pool implementation. It lets you handle multiple workers without the complexity of managing the pool yourself. Pools can be either fixed, meaning they contain a set number of reused workers, or dynamic, which means workers are added as required until the user-configured limit is reached.
You can create a simple pool to run a specified file in a worker thread by adding the following code to
Define the code to run in the fixed pool by adding the following content in
Now add the code to run in the dynamic pool to
poolifier package from npm, and then run
main.js with Node. You should see the following output as both thread pools start and run their jobs:
1Running in the fixed thread pool 2Running in the dynamic thread pool
The process will run until you terminate it by pressing Ctrl+C. Poolifier keeps the thread pools available to handle new tasks, blocking the process from exiting while some pools still exist.
Worker threads vs. other programming languages
Different programming languages implement multithreading in varying ways. In the case of Node.js, it's the multiprocess system provided by the
worker_threads module. Here's what some other languages offer.
C/C++: As low-level languages, C and C++ both feature true multithreading via the pthreads POSIX threading library. C++ also has a thread object in its standard namespace for even simpler concurrency. You're responsible for using atomics and mutexes to properly synchronize memory to avoid race conditions.
Java: Java also has real multithreading using the
Threadclass or its
Runnableinterface. These are supported by a comprehensive concurrency suite that helps you manage the threads you've created.
Python: Python has a threading library, but the most popular Python language implementation, CPython, can only execute one thread at a time. This means that although threading seems to be available, it will not speed up CPU-intensive code. Python also offers the multiprocessing module, which uses a similar approach to Node.js worker threads.
Ruby: Ruby has a similar situation to Python. The language has comprehensive thread support but MRI Ruby, the most popular implementation, only supports one thread at a time. Newer alternative interpreters, such as JRuby and Rubinius, do have true multithreading support.
Rust: Rust has comprehensive multithreading support. Rust lets you easily create threads and share data between them with a low risk of errors. The language's design renders many common concurrency bugs impossible, so it's a great choice for projects that will heavily rely on multithreading.
Threading in C/C++, Rust, and Java will be much quicker than the subprocess model of Node.js. These languages expose real threads, with shared state and all the memory management concerns it brings. Higher-level interpreted languages like Python, Ruby, and Node.js don't offer native thread implementations, instead using heavier worker subprocess solutions.
Worker threads give Node.js developers a way to run code in parallel by starting new child processes. This isn't real multithreading, however each "thread" is an independent process that lacks access to its parent's context. Communication between threads is only possible using allocated shared memory and messages exchanged via an event listener.
Highly CPU-intensive code where performance is critical will run more performantly when using real threads in another programming language. Worker threads are sufficient for most Node.js use cases, though, such as job queues in web apps or background video processing on your desktop.