Demystify asynchronous programming in NodeJS
What is NodeJS#
As an asynchronous event-driven JavaScript runtime, Node.js is designed to build scalable network applications - https://nodejs.org/en/about
What is event loop#
The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible - https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick#what-is-the-event-loop
CPU I/O Wait#
When processing an operation, ideally CPU will perform the algorithmic calculation on its own. However, in many cases, CPU has to retrieve data either from memory, disk and/or over the network. The problem here is, retrieving data from disk and network is taking a long time (I’m not sure about memory though, maybe it’s too insignificant).
Since it’s taking a long time (in CPU time) to retrieve data from disk and memory, CPU has to be essentially idle waiting for the data from disk & network. This is where CPU I/O wait comes from. These operations are also known as I/O bound operations.
These are the latency numbers from Jeff Dean, Distinguished Engineer at Google.
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
If the operation is doing CPU intensive calculation (eg. cryptographic processing, 3D rendering), performing the operation in event loop will not give it any benefit. These operations are also known as CPU bound operations.
Demo#
Let’s conduct an experiment to see how event loop works in NodeJS. It’s not easy to simulate a slow disk access (due to my limited knowledge obviously), so lets simulate a slow web/API server.
Prepare a slow web server#
Let’s prepare a simple slow web server. We’ll be using ExpressJS with Javascript for simplicity.
First create a directory called app-server
and initiate npm setup in the dir:
$ npm init -y
And then install express like below:
$ npm install express
And then copy the following source code into ./src/app.js
const express = require('express');
const { setTimeout: setTimeoutPromiseBased } = require('timers/promises');
const app = express();
const port = 3000;
app.get('/', async (req, res) => {
const start = new Date()
await setTimeoutPromiseBased(2000);
const end = new Date()
res.send(`start: ${start} - end: ${end}`);
});
app.listen(port, () => {
console.log(`[server]: Server is running at http://localhost:${port}`);
});
This is a very simple web server where whenever it receive a request, it will sleep for 2 seconds before responding. This could be your one of your slow microservice API.
Finally, launch our slow web server with
$ node src/app.js
You can test the server by run a cURL command on it:
$ curl localhost:3000
start: Mon Apr 03 2023 22:04:28 GMT+0800 (Malaysia Time) - end: Mon Apr 03 2023 22:04:30 GMT+0800 (Malaysia Time)
Make HTTP requests from Python#
Before we make a request from NodeJS, let’s run a control experiment using Python first. Create a request.py
file with the following source code:
import requests
def httpRequest(n) {
print('start ' + n)
res = requests.get('http://localhost:3000')
print(res.text)
print('end ' + n)
}
`
httpRequest(1)
httpRequest(2)
Executing the script will produce the following output:
No matter how many times you run the script, the output will always be the same. This means the script runs consistently, the first request always goes first, and returned back and then the second request goes after that and returned back. No magic here.
Bonus: You can try write this script using async Python as well.
Make HTTP requests from NodeJS#
Let’s see how this works in NodeJS now.
Create a file request.js
with the following source code:
async function httpRequest(n) {
console.log('start', n)
const response = await fetch("http://localhost:3000");
const data= await response.text();
console.log(data);
console.log('end', n)
}
httpRequest(1)
httpRequest(2)
Now when you run the script, you’ll see this output:
And sometimes, this output:
Why is that so?
Interpretation#
As an example, let’s consider a case where each request to a web server takes 50ms to complete and 45ms of that 50ms is database I/O that can be done asynchronously. Choosing non-blocking asynchronous operations frees up that 45ms per request to handle other requests. This is a significant difference in capacity just by choosing to use non-blocking methods instead of blocking methods. - https://nodejs.org/en/docs/guides/blocking-vs-non-blocking#concurrency-and-throughput
Note that, this asynchronous programming is not unique to NodeJS. It also can be done in Python using async function, Rust using tokio-rs, Java using Futures, and many others.