Before memorising phases and APIs, lock in this mental model. Node.js is not magic — it's one very efficient decision: don't wait, move on. While one request waits for a database response, serve ten others.
Imagine a café with one highly efficient waiter (the event loop) and a kitchen full of chefs (OS + libuv threads).
- The waiter takes your order and immediately moves to the next table — never stands watching the kitchen.
- When food is ready, the kitchen rings a bell → the waiter picks it up and delivers it.
- A traditional multi-threaded server hires one waiter per table — expensive, and most of them just stand waiting.
Node.js doesn't make I/O faster — it makes the time spent waiting productive.
- Handling thousands of concurrent connections
- HTTP APIs, database calls, file reads (I/O-bound)
- Real-time apps — chat, live dashboards, notifications
- Streaming large files without loading all into memory
- CPU-heavy work: video encoding, ML inference, crypto
- Workloads needing true parallel CPU threads
- Long synchronous blocking operations on the main thread
epoll (Linux), kqueue (macOS), IOCP (Windows) for async network I/O.The event loop is Node's task scheduler. Before the phases table, understand the big picture: the event loop is a loop that runs forever, checking queues in a fixed order and running callbacks.
Each "tick" of the event loop is one lap around a racetrack. The track has fixed pit stops in a fixed order. Your callbacks sit at different pit stops waiting to be picked up:
- Timers pit stop —
setTimeout/setIntervalcallbacks whose time has come - I/O pit stop (poll) — file / network callbacks that just completed in the background
- Check pit stop —
setImmediatecallbacks - Microtask express lane —
process.nextTickand Promises cut the queue between every pit stop
The Event Loop is the mechanism that allows Node.js to perform non-blocking I/O operations despite JavaScript being single-threaded. It continuously checks whether there are tasks to execute and in what order.
Node.js uses libuv (a C library) under the hood to implement the event loop. Each "tick" of the loop passes through these 6 phases in order:
| # | Phase | What runs here | Key API |
|---|---|---|---|
| 1 | timers | Callbacks from setTimeout and setInterval whose delay has elapsed |
setTimeout |
| 2 | pending callbacks | I/O callbacks deferred from the previous loop iteration (e.g. some TCP errors) | — |
| 3 | idle / prepare | Internal libuv use only — you cannot schedule work here directly | — |
| 4 | poll | Fetch new I/O events; execute I/O-related callbacks (file reads, network). Loop waits here if nothing else is pending. | fs.readFile |
| 5 | check | Callbacks scheduled with setImmediate |
setImmediate |
| 6 | close callbacks | Close events — e.g. socket.on('close', ...) |
.on('close') |
Between every phase transition, Node.js drains two microtask queues before moving to the next phase:
- process.nextTick queue — always drains first (highest priority)
- Promise microtask queue — drains after nextTick
process.nextTick will starve I/O — the loop never advances past microtasks.
// What is the output order? setTimeout(() => console.log('1. setTimeout'), 0); setImmediate(() => console.log('2. setImmediate')); process.nextTick(() => console.log('3. nextTick')); Promise.resolve().then(() => console.log('4. Promise.then')); console.log('5. synchronous'); /* Output: 5. synchronous ← sync code runs first (call stack) 3. nextTick ← nextTick queue (before promises) 4. Promise.then ← promise microtask queue 1. setTimeout ← timers phase (may swap with setImmediate 2. setImmediate depending on when loop starts) */
"The event loop is Node's scheduler. Sync code runs first. Then microtasks — nextTick before Promises. Then the loop cycles through its 6 phases: timers → pending callbacks → idle → poll → check → close. The poll phase is where Node waits for I/O. setImmediate fires in the check phase, always after I/O callbacks — that's its guarantee."
process.nextTick, setImmediate, and setTimeout(fn, 0)?| API | When it runs | Use case |
|---|---|---|
| process.nextTick | After current operation, before any I/O or timers — highest priority microtask | Ensure a callback fires asynchronously but before anything else |
| Promise.then | After nextTick queue empties — second priority microtask | Standard async control flow |
| setImmediate | Check phase — guaranteed after I/O callbacks in the same loop iteration | Do something after I/O handlers finish |
| setTimeout(fn, 0) | Timers phase — minimum 1ms delay (OS-dependent), can slip past setImmediate outside I/O | Delay by at least one tick — less predictable than setImmediate |
// OUTSIDE an I/O callback — order of setTimeout vs setImmediate is UNDEFINED setTimeout(() => console.log('timeout'), 0); setImmediate(() => console.log('immediate')); // Could print either order — depends on OS timer resolution // INSIDE an I/O callback — setImmediate ALWAYS wins fs.readFile(__filename, () => { setTimeout(() => console.log('timeout'), 0); setImmediate(() => console.log('immediate')); // always first });
setImmediate over setTimeout(fn, 0) when you need post-I/O ordering — it's deterministic.
Use process.nextTick sparingly and only when you truly need the highest priority.
"nextTick fires before any I/O in the current microtask checkpoint — it's the highest priority. setImmediate fires in the check phase, after I/O. setTimeout(0) fires in the timers phase and has at least a 1ms minimum delay, so its ordering versus setImmediate is non-deterministic outside I/O callbacks."
The event loop runs on a single thread. Every callback runs to completion before the next one starts. If a callback takes a long time — a CPU-heavy computation, a massive JSON.parse, a synchronous file read — no other request is served during that time.
- Synchronous I/O:
fs.readFileSync,fs.writeFileSync - Heavy computation: sorting millions of items, crypto without worker
- Large JSON:
JSON.parseof a 50MB response body - Regex with catastrophic backtracking (ReDoS)
- Infinite or deep synchronous recursion
// ❌ BLOCKS the event loop — no request served during this function badRoute(req, res) { const data = fs.readFileSync('/huge-file.json'); const parsed = JSON.parse(data); // still on main thread res.send(parsed); } // ✅ Non-blocking — event loop is free while file is being read async function goodRoute(req, res) { const data = await fs.promises.readFile('/huge-file.json'); const parsed = JSON.parse(data); // JSON.parse still blocks! See below res.send(parsed); } // ✅ Offload CPU work to a worker thread const { Worker } = require('worker_threads'); function parseInWorker(rawJson) { return new Promise((resolve, reject) => { const worker = new Worker(` const { workerData, parentPort } = require('worker_threads'); parentPort.postMessage(JSON.parse(workerData)); `, { eval: true, workerData: rawJson }); worker.on('message', resolve); worker.on('error', reject); }); }
- clinic doctor — visualizes event loop lag over time
- perf_hooks:
monitorEventLoopDelay()— measures lag in ms - --inspect + Chrome DevTools — CPU profile to find hot synchronous code
const { monitorEventLoopDelay } = require('perf_hooks'); const h = monitorEventLoopDelay({ resolution: 20 }); h.enable(); setInterval(() => { console.log(`Event loop lag: ${h.mean / 1e6}ms mean`); }, 5000);
JavaScript runs on a single thread — but the operating system and libuv's thread pool do the actual I/O work in the background. Node.js is the middleman that hands off work and gets notified when it's done.
┌─────────────────────────────────────┐
│ Your JavaScript Code (V8) │ ← Single thread
├─────────────────────────────────────┤
│ Node.js Core APIs (C++ bindings) │
├─────────────────────────────────────┤
│ libuv │
│ ┌─────────────┐ ┌───────────────┐ │
│ │ Event Loop │ │ Thread Pool │ │ ← 4 threads default
│ │ (1 thread) │ │ (UV_THREADPOOL│ │ (max 128)
│ └─────────────┘ │ _SIZE) │ │
│ └───────────────┘ │
├─────────────────────────────────────┤
│ Operating System │
│ (epoll/kqueue/IOCP - async I/O) │
└─────────────────────────────────────┘
-
Network I/O (TCP/UDP, HTTP) — uses OS-level async APIs
(
epollon Linux,kqueueon macOS,IOCPon Windows). The OS tells libuv when a socket is readable/writable. No thread pool needed. - File system & DNS lookups — most OS file APIs are blocking, so libuv uses its thread pool (4 threads by default). A worker thread does the blocking call; when done it signals the event loop.
UV_THREADPOOL_SIZE=16 node app.jsThis helps when many concurrent disk I/O or
crypto operations queue up.
Max is 128 threads.
1. fs.readFile('data.txt', callback) → Registers callback, hands work to libuv thread pool 2. Event loop continues processing other events (non-blocking) 3. A libuv thread performs the blocking OS read() call 4. Thread completes → posts result to event loop's poll queue 5. Poll phase picks it up → calls your callback on the main thread
"JavaScript is single-threaded but Node.js uses libuv to delegate I/O work — network I/O goes through OS-level async interfaces like epoll, while file system work uses libuv's thread pool. When work completes, a callback is queued on the event loop and your JS code picks it up — never blocking the main thread."
Your JavaScript code runs on one thread — that's the V8 thread. But the Node.js process as a whole uses multiple threads:
- 1 main thread — V8 + event loop (your JS)
- 4+ libuv threads — file system, DNS, crypto, zlib
- V8 internal threads — GC, JIT compilation (background)
- Worker threads (optional) — you can spawn via
worker_threads
const crypto = require('crypto'); // These 4 hashes run CONCURRENTLY on 4 libuv threads // despite JavaScript initiating them "one after another" for (let i = 0; i < 4; i++) { crypto.pbkdf2('password', 'salt', 100000, 64, 'sha512', (err, key) => { console.log(`Hash ${i} done`); }); } // All 4 finish at roughly the same time — parallel on thread pool // 5th one has to wait for a free thread (default pool = 4) crypto.pbkdf2('password', 'salt', 100000, 64, 'sha512', (err, key) => { console.log('Hash 4 done — waited for a free thread'); });
In traditional multi-threaded servers (like Java), each request gets its own thread. With 10,000 concurrent connections you need 10,000 threads — enormous memory overhead and context-switching cost.
Node.js uses a single thread + event loop. While waiting for I/O (database, file, network), the thread serves other requests. 10,000 concurrent connections can be handled by a handful of threads if most time is spent waiting for I/O.
If a request requires heavy computation — image resizing, video encoding, complex cryptography, ML inference — that work occupies the main thread. All other requests wait.
// Simulating CPU work — blocks event loop for ~2 seconds function fibonacci(n) { if (n <= 1) return n; return fibonacci(n - 1) + fibonacci(n - 2); } app.get('/slow', (req, res) => { const result = fibonacci(44); // ← blocks ALL requests for ~2s res.send({ result }); }); // Fix: offload to worker thread app.get('/fast', async (req, res) => { const result = await runInWorker(44); // non-blocking res.send({ result }); });
- Worker Threads —
worker_threadsmodule for parallel JS execution - Child Processes —
child_process.forkfor separate Node processes - Native Addons — offload to C/C++ via N-API
- Microservice — delegate to a Python/Go service better suited for compute
Callbacks were the only async pattern in Node.js from v0.1 (2009) until Promises in ES6 (2015). Understanding them is still essential because:
- Many Node.js core APIs still use callbacks (
fs,crypto,dns) - EventEmitters are callbacks under the hood
util.promisifyis your bridge between the callback world and async/await
null if none). Ensures you can't accidentally ignore errors.const { promisify } = require('util'); const fs = require('fs'); // Old callback-style API fs.readFile('data.txt', 'utf8', (err, data) => { if (err) throw err; console.log(data); }); // Promisified — now usable with async/await const readFileAsync = promisify(fs.readFile); const data = await readFileAsync('data.txt', 'utf8'); // Note: fs.promises already provides async versions natively: const data2 = await fs.promises.readFile('data.txt', 'utf8');
A callback is a function passed as an argument to another function, to be called when an async operation completes. Node.js follows the error-first callback convention: (err, result) => {}
// ❌ Hard to read, error-prone, hard to maintain fs.readFile('user.json', (err, userData) => { if (err) return handleError(err); db.getUser(userData.id, (err, user) => { if (err) return handleError(err); db.getOrders(user.id, (err, orders) => { if (err) return handleError(err); emailService.send(user.email, orders, (err) => { if (err) return handleError(err); console.log('Done!'); // deeply nested — "pyramid of doom" }); }); }); });
// ✅ Same logic — flat, readable, error handled in one place async function processUser() { try { const userData = JSON.parse(await fs.promises.readFile('user.json')); const user = await db.getUser(userData.id); const orders = await db.getOrders(user.id); await emailService.send(user.email, orders); console.log('Done!'); } catch (err) { handleError(err); // one catch handles all failures } }
| Feature | CommonJS (CJS) | ES Modules (ESM) |
|---|---|---|
| Syntax | require() / module.exports | import / export |
| Loading | Synchronous — blocks | Asynchronous — non-blocking |
| Evaluation | Dynamic — can require() inside a function | Static — imports resolved at parse time |
| Tree-shaking | Not possible | Bundlers can eliminate dead code |
| Top-level await | Not supported | Supported |
| File extension | .js (with type:commonjs) | .mjs or .js (with type:module) |
__dirname / __filename | Available | Not available — use import.meta.url |
// math.js function add(a, b) { return a + b; } module.exports = { add }; // app.js const { add } = require('./math'); // Can also do this dynamically: if (condition) { const utils = require('./utils'); // dynamic require — valid in CJS }
// math.mjs export function add(a, b) { return a + b; } // app.mjs import { add } from './math.mjs'; // must include extension // Dynamic import — works in ESM (returns a Promise) const { add } = await import('./math.mjs'); // __dirname equivalent in ESM import { fileURLToPath } from 'url'; import { dirname } from 'path'; const __dirname = dirname(fileURLToPath(import.meta.url));
- Use CJS for existing Node.js projects and when publishing to npm (better ecosystem compatibility)
- Use ESM for new projects, browser-shared code, and when you need tree-shaking or top-level await
- Set
"type": "module"in package.json to make all.jsfiles ESM
The first time you require('./foo'), Node.js loads, compiles, and executes it, then caches the result in require.cache (keyed by resolved filename). Every subsequent require('./foo') returns the cached exports object — the module is not re-executed.
// counter.js let count = 0; module.exports = { increment: () => ++count, get: () => count }; // app.js const a = require('./counter'); const b = require('./counter'); // same cached object a.increment(); console.log(b.get()); // → 1 (a and b are the SAME object) // Force a fresh load (rare — testing, hot reload) delete require.cache[require.resolve('./counter')]; const c = require('./counter'); // fresh copy, count = 0
A circular dependency is when module A requires B, and B requires A. Node.js handles this by returning an incomplete (partial) exports object for the module currently being loaded.
// a.js console.log('a.js loading'); const b = require('./b'); // triggers b.js to load console.log('b.done =>', b.done); // → true module.exports = { done: true }; // b.js console.log('b.js loading'); const a = require('./a'); // a is mid-load → gets {} (empty!) console.log('a.done =>', a.done); // → undefined (partial export) module.exports = { done: true };
When you call require('X'), Node.js resolves it using this algorithm:
-
Is X a core module? (
fs,path,http…) → return immediately -
Does X start with
./,../, or/? → it's a file path:- Try
Xexactly - Try
X.js,X.json,X.node - Try
X/index.js,X/index.json,X/index.node
- Try
-
Otherwise → look in
node_modulesfolders, walking up the directory tree:./node_modules/X../node_modules/X../../node_modules/X… up to root
// See exactly where a module was loaded from: console.log(require .resolve ('express')); // → /project/node_modules/express/index.js // Inspect the full module cache: console.log(Object.keys(require.cache)); // package.json "main" field controls which file is the entry // package.json "exports" field (Node 12+) controls subpath exports // package.json "type": "module" makes .js files use ESM
exports field in package.json (introduced in Node 12) overrides the old resolution and lets package authors control exactly which files are exposed — preventing internal paths from being imported directly.
Async patterns evolved to solve one compounding problem: how do you write code that waits for things without becoming unreadable or unsafe? Each generation fixed the previous generation's main pain point.
fs.readFile('f', (err, data) => { if (err) ... })
.then() chains replace nesting. One .catch() handles all upstream errors. Still verbose for complex flows.
readFile('f').then(process).catch(handleError)
const data = await readFile('f');
for await (const chunk of stream) { ... }
When you await a Promise, the async function is suspended and the event loop is freed to run other callbacks. When the Promise resolves, the function is re-queued as a microtask and resumes. No thread blocking — ever.
async/await is compiled to Promises. Promises use microtasks. Microtasks run between every event loop phase. Master the Promise model first — everything else is syntax on top.
await db.query() does NOT block the thread. It suspends only that async function's execution context. The event loop continues serving other requests while your query runs. This is the entire value of async in Node.js.
.then / .catch / .finally chaining works.When you order at a fast-food counter, they hand you a ticket stub. You don't stand blocking the counter. Instead:
- You hold the stub (pending) and do other things while food is prepared
- Your number is called — the stub redeems for real food (fulfilled)
- They ran out of ingredients — the stub gets cancelled with a reason (rejected)
- Once your number is called (or cancelled), it will never be called again — Promises settle exactly once
.then(). Attaching handlers to an already-settled Promise is valid — the handler fires immediately as a microtask.
A Promise is an object representing the eventual completion or failure of an async operation. It acts as a placeholder for a value that is not yet available.
| State | Meaning | Can transition to |
|---|---|---|
| pending | Initial state — neither fulfilled nor rejected | fulfilled or rejected |
| fulfilled | Operation completed successfully, has a value | — (terminal) |
| rejected | Operation failed, has a reason (error) | — (terminal) |
resolve() after reject() has already been called is silently ignored.
const p = new Promise((resolve, reject) => { // executor runs SYNCHRONOUSLY setTimeout(() => resolve(42), 1000); }); p .then(val => { console.log('Got:', val); // Got: 42 return val * 2; // value passed to next .then }) .then(val => console.log('Doubled:', val)) // Doubled: 84 .catch(err => console.error('Error:', err)) // catches any error above .finally(() => console.log('Done')); // always runs, passes value through
console.log('1 — sync'); Promise.resolve('hello') .then(v => console.log('3 —', v)); // microtask — runs AFTER sync code console.log('2 — sync'); // Output: 1 — sync, 2 — sync, 3 — hello // Even though the Promise is already resolved, .then fires asynchronously
"A Promise has three states: pending, fulfilled, rejected — and once settled, it never changes. Each .then() returns a new Promise so you can chain. .catch() is shorthand for .then(null, handler). .finally() runs regardless of outcome and passes the value through — it's used for cleanup like hiding a loading spinner."
.then()?Each .then() must return a value for the next handler to receive it. If you forget to return, the next .then() receives undefined — a very common silent bug.
// ❌ Missing return — next .then gets undefined fetch('/api/user') .then(res => { res.json(); // ← no return! }) .then(user => console.log(user)); // → undefined // ✅ Correct — return the promise so it chains fetch('/api/user') .then(res => res.json()) // returns the json() Promise .then(user => console.log(user)); // gets the actual user object
// ❌ Nested .then — defeats the purpose of chaining fetch('/api/user') .then(res => res.json() .then(user => fetch(`/api/orders/${user.id}`) .then(r => r.json()))); // ✅ Flat chain — return the inner promise to the outer chain fetch('/api/user') .then(res => res.json()) .then(user => fetch(`/api/orders/${user.id}`)) .then(res => res.json()) .then(orders => console.log(orders)) .catch(err => console.error(err)); // one handler for all
.then() is a station that receives, transforms, and passes on the item. If a station doesn't hand the item to the belt (no return), the next station gets nothing.
async/await work under the hood? What does an async function actually return?Think of an async function as a function that can pause its own execution at any await point — without blocking the thread. It's like a bookmark in a book.
You're watching live TV (your async function is running). Something comes up — you pause (await fires). You handle the interruption. When it's resolved, you resume from exactly where you paused. The TV station (event loop) never stopped broadcasting for everyone else while your TV was paused.
await points without blocking the JS thread.function* (generator functions) + a Promise runner to achieve the same pause/resume behaviour. async/await is built on exactly that mechanism — V8 compiles it to generator-style code internally. yield in a generator = await in an async function.
- An
asyncfunction always returns a Promise — even if you return a plain value, it gets wrapped inPromise.resolve() awaitsuspends the function until the awaited Promise settles, then resumes with the resolved value (or throws on rejection)
// These two are functionally equivalent: async function fetchUser(id) { const res = await fetch(`/api/users/${id}`); const user = await res.json(); return user; } function fetchUser(id) { return fetch(`/api/users/${id}`) .then(res => res.json()); } // async always returns a Promise — even plain values get wrapped: async function getValue() { return 42; } getValue().then(console.log); // logs 42 — it IS a Promise // await works on any thenable — even plain values: const a = await 42; // valid — wraps in Promise.resolve(42) const b = await Promise.resolve('hello');
async function demo() { console.log('A'); await Promise.resolve(); // suspends here — schedules resume as microtask console.log('C'); // runs AFTER the current sync code finishes } demo(); console.log('B'); // Output: A → B → C // "B" runs before "C" because await yields to the event loop
async/await is transpiled to generator functions (function*) + a Promise-based runner by Babel/TypeScript. The yield in a generator suspends execution just like await does — the async/await syntax is just cleaner sugar on top.
async/await mistakes? How do you spot and fix them?// ❌ user is a Promise object, not a user! async function bad() { const user = getUser(1); // forgot await console.log(user.name); // → undefined (no crash!) } // ✅ async function good() { const user = await getUser(1); console.log(user.name); // works }
// ❌ forEach ignores returned Promises — items process uncontrolled async function processAll(items) { items.forEach(async (item) => { await processItem(item); }); console.log('Done?'); // prints BEFORE items finish! } // ✅ Option A: for...of (sequential) async function processSequential(items) { for (const item of items) { await processItem(item); // truly waits for each } console.log('Done'); } // ✅ Option B: Promise.all (parallel) async function processParallel(items) { await Promise.all(items.map(item => processItem(item))); console.log('Done'); }
// ❌ Total time: 300ms (100 + 200) — these are independent! async function slow() { const user = await getUser(1); // waits 100ms const orders = await getOrders(1); // then waits 200ms } // ✅ Total time: ~200ms (run together) async function fast() { const [user, orders] = await Promise.all([ getUser(1), // both start at the same time getOrders(1) ]); }
await calls are not dependent on each other's result, run them in parallel with Promise.all. Sequential await is only correct when the second needs the first's result.
In synchronous code, throw unwinds the call stack upward. In async code, an error inside an async function becomes a rejected Promise — it travels down the Promise chain instead, jumping to the nearest .catch().
Imagine a Promise chain as a series of water pipes. A crack (thrown error) at any point causes water (data) to stop flowing and instead activates a drain valve (the nearest .catch() downstream). All pipes between the crack and the drain are bypassed — their .then() handlers are skipped. If there's no drain at all, the water floods the floor — an unhandled rejection.
Inside an async function, a throw or a runtime error (like null.property) automatically rejects the returned Promise — no extra code needed.
If an awaited Promise rejects and there's no try/catch around it, the rejection propagates out of the async function as its own rejected Promise — bubbling up to the caller.
| Node version | Behaviour on unhandled rejection |
|---|---|
| Node 10–14 | Warning printed, process continues (dangerous!) |
| Node 15+ | Process crashes with exit code 1 (same as uncaught exception) |
// ✅ try/catch — standard for async functions async function fetchData(url) { try { const res = await fetch(url); if (!res.ok) throw new Error(`HTTP ${res.status}`); return await res.json(); } catch (err) { console.error('Fetch failed:', err.message); throw err; // re-throw so caller can decide } } // ✅ Global safety net — last resort, NOT a substitute for local handling process.on('unhandledRejection', (reason, promise) => { console.error('Unhandled rejection:', reason); process.exit(1); // explicit crash is safer than corrupt state }); // ✅ Express: async route errors are NOT caught by default const asyncHandler = fn => (req, res, next) => Promise.resolve(fn(req, res, next)).catch(next); // forwards to error middleware app.get('/user/:id', asyncHandler(async (req, res) => { const user = await getUser(req.params.id); res.json(user); }));
"In Node 15+ an unhandled rejection crashes the process — which is actually safer than continuing with corrupted state. The fix is always local try/catch in async functions and re-throwing when the caller needs to handle it. For Express, async errors aren't caught by default so you need a wrapper or the express-async-errors package."
try/catch scope gotchas with async code? When does it not catch what you expect?// ❌ The catch block never runs async function tricky() { try { setTimeout(async () => { await mightFail(); // error thrown here... }, 1000); // ...but the try block has already exited by then! } catch (err) { console.log('never runs'); } } // ✅ Move the try/catch inside the callback setTimeout(async () => { try { await mightFail(); } catch (err) { handleError(err); } }, 1000);
// ❌ If task2 rejects, you lose task1 and task3 results entirely try { const [a, b, c] = await Promise.all([task1(), task2(), task3()]); } catch (err) { // only know SOMETHING failed, not which one } // ✅ allSettled when you need every result regardless of failures const results = await Promise.allSettled([task1(), task2(), task3()]); results.forEach((r, i) => { if (r.status === 'fulfilled') console.log(`task${i+1}:`, r.value); else console.error(`task${i+1} failed:`, r.reason); });
// ❌ This outer try/catch does NOT catch errors thrown inside .then() try { somePromise().then(result => { throw new Error('oops'); // becomes a rejected Promise }); } catch (e) { console.log('never runs'); } // ✅ Errors in .then() are caught by the next .catch() in the chain somePromise() .then(result => { throw new Error('oops'); }) .catch(e => console.log('caught:', e.message)); // works
Promise.all, Promise.allSettled, Promise.race, and Promise.any — when do you use each?| Method | Resolves when | Rejects when | Best for |
|---|---|---|---|
| Promise.all | ALL fulfill | ANY rejects (fail-fast) | Parallel calls that ALL must succeed |
| Promise.allSettled | ALL settle (any state) | Never | Bulk ops — want every result regardless |
| Promise.race | FIRST settles (resolve OR reject) | FIRST rejects | Timeout pattern, fastest resource wins |
| Promise.any | FIRST fulfills | ALL reject → AggregateError | Redundant sources, use fastest success |
const p1 = new Promise(r => setTimeout(() => r('A'), 100)); const p2 = new Promise(r => setTimeout(() => r('B'), 200)); const p3 = new Promise((_, r) => setTimeout(() => r(new Error('C failed')), 150)); // .all — all or nothing await Promise.all([p1, p2]); // → ['A', 'B'] after 200ms await Promise.all([p1, p2, p3]); // throws at 150ms, A still pending // .allSettled — always waits, gives status for each await Promise.allSettled([p1, p2, p3]); // → [{status:'fulfilled', value:'A'}, // {status:'fulfilled', value:'B'}, // {status:'rejected', reason: Error}] // .race — first to settle wins (resolve OR reject) await Promise.race([p1, p2, p3]); // → 'A' at 100ms // .any — first to RESOLVE (skips rejections) await Promise.any([p3, p1, p2]); // → 'A' at 100ms (p3 rejection ignored) await Promise.any([p3]); // throws AggregateError (all rejected)
function withTimeout(promise, ms) { const timer = new Promise((_, reject) => setTimeout(() => reject(new Error(`Timed out after ${ms}ms`)), ms) ); return Promise.race([promise, timer]); } const data = await withTimeout(fetchData(), 5000);
You cannot Promise.all(items.map(processItem)) when items has thousands of entries — you'd fire thousands of requests simultaneously, overwhelming the server, exhausting connection pools, or triggering rate limits.
// ❌ Fires ALL 1000 requests simultaneously const results = await Promise.all(urls.map(url => fetch(url))); // ✅ Process max 5 at a time using a worker-pool pattern async function pLimit(taskFns, concurrency) { const results = new Array(taskFns.length); let cursor = 0; async function worker() { while (cursor < taskFns.length) { const i = cursor++; results[i] = await taskFns[i](); // each worker picks next available task } } // Start `concurrency` workers — they race to consume tasks await Promise.all(Array.from({ length: concurrency }, worker)); return results; } const tasks = urls.map(url => () => fetch(url).then(r => r.json())); const results = await pLimit(tasks, 5); // max 5 in-flight at any time // Or use the battle-tested p-limit package (same idea, more robust): import pLimit from 'p-limit'; const limit = pLimit(5); const out = await Promise.all( urls.map(url => limit(() => fetch(url).then(r => r.json()))) );
for await...of? When do you use it instead of Promise.all?for await...of iterates over an async iterable — any object implementing [Symbol.asyncIterator](). It awaits each item before the loop body runs and before requesting the next item. This gives you natural backpressure.
// Async generator — produces pages lazily, on demand async function* paginate(url) { let page = 1; while (true) { const data = await fetch(`${url}?page=${page}`).then(r => r.json()); if (!data.length) break; yield data; // pause until consumer asks for next page++; } } // Consumer controls the pace — next page only fetched when loop body finishes for await (const page of paginate('/api/users')) { await processPage(page); // slow processing doesn't flood the API } // Node.js Readable streams are async iterables (Node 10+): const stream = createReadStream('large-file.txt', { encoding: 'utf8' }); for await (const chunk of stream) { process(chunk); // backpressure handled automatically }
| for await...of | Promise.all | |
|---|---|---|
| Parallelism | Sequential (one at a time) | All run concurrently |
| Source | Stream / generator / unknown size | Fixed array of known tasks |
| Backpressure | Natural — consumer sets pace | None — all start immediately |
| Memory | Low — one item at a time | All results held in memory |
| Use when | Pagination, file streams, lazy data | Parallel API calls, fixed batch |
| Promise | EventEmitter | Async Generator | |
|---|---|---|---|
| Values | Single, once | Multiple, unbounded | Multiple, lazy |
| Model | Pull (one-shot) | Push | Pull |
| Backpressure | N/A | None built-in | Natural |
| Error handling | .catch / try-catch | 'error' event | try-catch in loop |
| Cancel | AbortController | removeListener | .return() |
// Promise — single value, resolves once const data = await fetchUser(1); // done, can't receive more // EventEmitter — producer pushes multiple values (push model) const stream = getDataStream(); stream.on('data', chunk => process(chunk)); // producer controls pace stream.on('end', () => console.log('done')); stream.on('error', err => handleError(err)); // Async Generator — consumer pulls values (pull model) async function* getDataStream() { yield await fetch('/chunk/1').then(r => r.json()); yield await fetch('/chunk/2').then(r => r.json()); } for await (const chunk of getDataStream()) { await process(chunk); // next fetch only starts when I'm ready } // Node Readable stream can be consumed BOTH ways: const rs = fs.createReadStream('file.txt'); rs.on('data', chunk => ...); // push style for await (const chunk of rs) ... // pull style (same stream)
"Promises are for a single async value. EventEmitters are for multiple values pushed by the producer — no built-in backpressure. Async generators are for multiple values where the consumer controls the pace, naturally handling backpressure. Node streams are EventEmitters that also expose an async iterator interface."
Before diving into the questions, build this mental model. Every concept in this segment connects back to one core problem: how do you move large amounts of data efficiently without running out of memory?
Imagine you want to read a 4 GB video file. If you load the whole thing into RAM first, your server needs at least 4 GB free — just for one request. Ten concurrent users? 40 GB. This is why servers crash.
Instead of loading everything at once, a stream reads a small chunk (e.g. 64 KB), hands it to you, then reads the next chunk. Memory usage stays constant — 64 KB — no matter how big the file is.
When you watch Netflix, the video streams to you — a few seconds at a time. You don't wait for the full 2 GB movie to download before you can watch it. Node.js streams work exactly the same way: data flows piece by piece rather than all at once.
Buffer in Node.js? How does it differ from a Uint8Array and a regular JavaScript array?Computers only understand numbers. Every piece of data — text, images, video, a JSON file — is ultimately stored as a sequence of numbers from 0 to 255. Each of these numbers is called a byte (8 bits). A byte can represent 256 different values (2⁸).
Computers agreed on a standard called ASCII (later UTF-8) that maps letters to numbers. The letter 'H' = 72, 'e' = 101, 'l' = 108, 'o' = 111. So the word "Hello" is stored in memory as the five bytes: 72 101 108 108 111. A Buffer lets you see and manipulate these raw bytes directly.
Character: H e l l o
Decimal: 72 101 108 108 111
Hex: 0x48 0x65 0x6c 0x6c 0x6f
Binary: 01001000 01100101 01101100 01101100 01101111
Buffer: <Buffer 48 65 6c 6c 6f> ← Node.js shows hex by default
JavaScript was designed for web pages — it only had strings and numbers. When Node.js arrived and needed to work with files, TCP sockets, and image data, strings were not enough — they add encoding overhead and can't represent arbitrary bytes. Buffer was created to fill this gap.
A string always applies an encoding (UTF-16 internally). You cannot efficiently represent raw image pixels, encrypted data, or network protocol headers as a string.
A Buffer holds arbitrary bytes — no encoding applied. You can read a PNG file byte by byte, or build a TCP packet header with exact byte values. Full control.
V8 (the JS engine) manages its own memory region called the heap — this is where your JavaScript objects, arrays, and strings live. The Garbage Collector (GC) watches the heap and frees unused objects automatically.
Buffers are allocated outside this heap, directly in C++ memory managed by Node.js. This means:
- Buffers don't add GC pressure — the GC doesn't need to scan them
- You can allocate large Buffers without hitting V8's heap size limit
- The downside: you must be careful about when they get freed
| Buffer | Uint8Array | Regular Array | |
|---|---|---|---|
| Memory location | Outside V8 heap (C++) | V8 heap (typed) | V8 heap (dynamic) |
| Values stored | Integers 0–255 (bytes) | Integers 0–255 | Any JS value |
| Size after creation | Fixed — cannot grow | Fixed — cannot grow | Dynamic — can push/pop |
| Encoding helpers | Yes — .toString(), .write()… | No | No |
| Performance for binary | Fastest | Fast | Slow |
| Works in browser | No — Node.js only | Yes | Yes |
Buffer is a subclass of Uint8Array. Every Buffer is a Uint8Array, but not every Uint8Array is a Buffer. Buffer adds Node-specific encoding/decoding methods on top.
const buf = Buffer.from('Hello'); console.log(buf); // <Buffer 48 65 6c 6c 6f> (hex values) console.log(buf[0]); // 72 — decimal value of 'H' console.log(buf.length); // 5 — bytes, not characters // ⚠ bytes ≠ characters for non-ASCII text const euro = Buffer.from('€'); console.log(euro.length); // 3 — '€' needs 3 bytes in UTF-8 console.log('€'.length); // 1 — JS string measures in UTF-16 code units // Buffer IS a Uint8Array (subclass check) console.log(buf instanceof Uint8Array); // true console.log(Buffer.isBuffer(buf)); // true console.log(Buffer.isBuffer(new Uint8Array(5))); // false
"Buffer is Node's mechanism for working with raw binary data — the actual bytes flowing through files and networks. It's allocated outside V8's heap so it doesn't impact garbage collection, and since Node 4 it's a subclass of Uint8Array for browser compatibility. The key value over a plain Uint8Array is the encoding/decoding helpers like toString('base64') that are essential for I/O work."
Buffer.alloc, Buffer.allocUnsafe, and Buffer.from? When does it matter?When you create a Buffer, Node.js asks the operating system for a chunk of RAM. Think of RAM like a whiteboard. There are two ways to get a section of whiteboard:
Someone cleans the section before handing it to you — no traces of previous writing. Slower, but you know exactly what's on it: zeros.
You get the section immediately — but someone else's writing might still be there. Faster, but you might see their old data (passwords, private keys).
0x00 (zero). Safe — no leftover data. Use this when you'll write into it gradually.// Buffer.from — create from existing data Buffer.from('hello', 'utf8') // from string with encoding Buffer.from([0x48, 0x65, 0x6c]) // from array of byte values Buffer.from(anotherBuf) // copy of another buffer // Buffer.alloc — safe zeroed memory const safe = Buffer.alloc(10); console.log(safe); // <Buffer 00 00 00 00 00 00 00 00 00 00> — all zeros // Buffer.allocUnsafe — fast but raw memory (may contain old bytes!) const unsafe = Buffer.allocUnsafe(10); // May print: <Buffer a3 00 f2 11 ...> — leftover garbage from other processes unsafe.fill(0); // you MUST zero it yourself before using
When your Node.js process runs, it shares physical RAM with other processes. When memory is freed (from a previous Buffer, for example), the bytes are not erased — they're just marked "available." allocUnsafe hands you that raw memory immediately. If that memory previously held a user's password or auth token, and you send your Buffer over the network before overwriting it, you've just leaked sensitive data.
Buffer.allocUnsafe only when you will immediately overwrite the entire buffer (e.g. a stream fills it directly from disk). For anything you might expose or send externally, use Buffer.alloc.
An encoding is the rule for converting between bytes and text. The same bytes mean different things under different encodings.
| Encoding | What it does | Use it for |
|---|---|---|
utf8 | Standard text — 1–4 bytes per character. Supports every language. | Reading/writing text files, JSON, most web data |
hex | Each byte shown as 2 hex digits (0–9, a–f). Not compressed. | Debugging, checksums, crypto hashes |
base64 | Converts binary to printable ASCII (A–Z, a–z, 0–9, +, /). ~33% larger. | Embedding images in HTML, JWT tokens, email attachments |
ascii | 7-bit ASCII only — 1 byte per character, English only. | Legacy systems, simple protocols |
binary | Latin-1 — one byte per character, 256 characters max. | Binary protocol headers, legacy data |
const buf = Buffer.from('hello'); buf.toString('utf8') // → 'hello' (human-readable text) buf.toString('hex') // → '68656c6c6f' (good for debugging) buf.toString('base64') // → 'aGVsbG8=' (good for HTTP/JSON) // Round-trip: base64 → Buffer → utf8 const encoded = buf.toString('base64'); // 'aGVsbG8=' const decoded = Buffer.from(encoded, 'base64').toString(); // 'hello' // Utility operations const combined = Buffer.concat([buf1, buf2, buf3]); // join multiple buffers const isEqual = buf1.equals(buf2); // constant-time compare (safe for crypto)
A stream is not a new idea — it is the same concept as a factory assembly line. Raw material arrives at one end, workers transform it piece by piece, and finished products come out the other end. No worker waits for every piece in the world to arrive before starting work.
Without streams (load everything first):
┌──────────────────────────────────────────────────────┐
│ Read ENTIRE 2 GB file into RAM → process → write │
│ Memory needed: 2 GB │
└──────────────────────────────────────────────────────┘
With streams (process chunk by chunk):
┌──────────┐ chunk ┌──────────┐ chunk ┌──────────┐
│ READ │ ────────► │ PROCESS │ ────────► │ WRITE │
│ (source) │ 64 KB │(transform│ 64 KB │ (sink) │
└──────────┘ └──────────┘ └──────────┘
Memory needed: ~64 KB (just one chunk at a time)
Every Node.js stream is an EventEmitter — meaning it communicates by firing named events ('data', 'end', 'error') rather than returning values directly. You listen for events to know when data arrives or when the stream is done.
A Readable stream is like a newspaper being printed: pages come off the press one at a time (chunks). A Writable stream is the delivery van — it accepts pages and eventually delivers the complete paper. A Transform stream is the editor in between, reviewing each page before it goes to the van. You don't wait for the entire edition to be printed before delivery begins.
The internal buffer is a small waiting area between two stages. If the van is full, the press operator doesn't just throw papers on the floor — the press pauses. That "van is full, press pauses" signal is backpressure (covered in Topic C).
Data can flow in different directions depending on the use case. Node.js has four stream types to cover every direction and combination. All four extend EventEmitter, so they communicate through events ('data', 'end', 'drain', 'error', etc.).
| Type | Direction | Analogy | Key events | Real examples |
|---|---|---|---|---|
| Readable | Produces data — flows out to you | A tap you open to get water | data, end, error |
fs.createReadStream, HTTP request body, process.stdin |
| Writable | Consumes data — you pour data in | A drain that accepts water | drain, finish, error |
fs.createWriteStream, HTTP response, process.stdout |
| Duplex | Both — independently in each direction | A telephone — you send and receive separately | Both sets | TCP socket (net.Socket), WebSocket |
| Transform | Both — but input becomes the output | A blender — what goes in comes out changed | Both sets | zlib.createGzip(), crypto cipher, CSV parser |
A regular function returns a value when called. A stream cannot do that — data arrives later and in pieces. Instead, a stream fires events. Your code listens for those events:
'data' with that chunk as the argument.'data' listener runs with the chunk — process it, write it, transform it.'end'. Your cleanup code runs here.'error'. Always handle this or the process crashes.Every stream has a small internal queue called its internal buffer. Data accumulates there between the time it is produced and the time the consumer reads it. The highWaterMark setting controls the maximum size of that queue before the stream signals "I'm full, slow down."
const fs = require('fs'); const zlib = require('zlib'); // Readable — produces data from disk chunk by chunk const readable = fs.createReadStream('input.txt', { highWaterMark: 64 * 1024 // 64 KB internal buffer (default) }); // Transform — compresses each chunk as it flows through const gzip = zlib.createGzip(); // Writable — writes compressed chunks to a file const writable = fs.createWriteStream('output.txt.gz'); // pipe() connects them: readable feeds gzip which feeds writable readable.pipe(gzip).pipe(writable); // Listening to events from each stream readable.on('data', chunk => console.log('read chunk:', chunk.length)); readable.on('end', () => console.log('reading done')); writable.on('finish', () => console.log('file fully written')); readable.on('error', err => console.error('read error:', err));
A Readable stream can deliver data in two ways. The question is: does the stream push data to you automatically, or do you ask for it explicitly?
Paused mode is like a tap: water only flows when you turn the handle. You are in control. Flowing mode is like an automatic sprinkler: water comes out on its own schedule. If your bucket is not ready when the water arrives, it spills on the floor (data loss).
┌──────────────────────────────────┐
│ Created — starts in PAUSED mode │
└──────────────┬───────────────────┘
│
┌────────────────────────▼──────────────────────────┐
│ PAUSED MODE │
│ Data sits in internal buffer waiting to be pulled │
│ You call stream.read() to get a chunk │
└──────┬──────────────────────────────┬─────────────┘
│ │
┌────────────▼──────────┐ ┌─────────────▼────────────┐
│ stream.on('data', …) │ │ stream.pipe() │
│ stream.resume() │ │ (also switches to flow) │
└────────────┬──────────┘ └─────────────┬────────────┘
│ │
┌──────▼──────────────────────────────▼─────────────┐
│ FLOWING MODE │
│ Stream pushes data to you as fast as it arrives │
│ 'data' event fires for every chunk │
└────────────────────────┬───────────────────────────┘
│
stream.pause() / stream.unpipe()
│
back to PAUSED
The 'readable' event fires when there is data available in the internal buffer. You call stream.read(size) to pull a chunk. This gives you fine-grained control over exactly how much you consume at once.
const stream = fs.createReadStream('file.txt'); // Stream starts paused — nothing flowing yet stream.on('readable', () => { // This fires whenever data becomes available in the internal buffer let chunk; while ((chunk = stream.read(1024)) !== null) { // read(1024) pulls exactly 1024 bytes at a time // returns null when the internal buffer is empty processChunk(chunk); } }); stream.on('end', () => console.log('All data consumed'));
Attaching a 'data' listener switches the stream to flowing mode immediately. From that point, 'data' fires for each chunk as fast as the source can produce it. You can call stream.pause() to stop the flow and stream.resume() to restart it.
const stream = fs.createReadStream('file.txt'); // Attaching 'data' immediately switches to FLOWING mode stream.on('data', chunk => { console.log('Got chunk of', chunk.length, 'bytes'); // If processing this chunk is slow, pause the stream: stream.pause(); // stop the flow doSlowWork(chunk).then(() => { stream.resume(); // resume when ready }); }); stream.on('end', () => console.log('stream finished')); stream.on('error', err => console.error('error:', err)); // In practice: pipe() does the pause/resume automatically stream.pipe(writable); // you almost never manage modes manually
.resume() or an early pipe()) before you attach your 'data' listener, chunks emitted in that gap are permanently lost — no error, no warning. Always attach listeners before any flow-triggering call.
pipe() handle it automatically?Before we touch code, understand the real-world problem. A producer is anything that generates data (reading a file from disk, receiving bytes over a network). A consumer is anything that processes that data (writing to another disk, inserting into a database). Producers and consumers rarely run at the same speed.
A fire hose pumps water at 500 litres per minute. Your bucket drains at 10 litres per minute. Within seconds the bucket overflows — water goes everywhere.
In Node.js: the "fire hose" is your Readable stream (disk reads at 500 MB/s). The "bucket" is your Writable stream (database writes at 10 MB/s). The unprocessed data doesn't vanish — it piles up in RAM. Eventually the server runs out of memory and crashes.
Backpressure is the signal that lets the bucket shout "STOP!" to the hose when full, and "GO!" when it has room again. In Node this signal is the return value of write().
write(chunk) on the Writable.write() returns false — meaning "I'm full, stop sending."false and keeps pushing. Buffer grows 64 KB → 1 MB → 500 MB → OOM crash (process killed).pause(). Stops reading. Writable drains its buffer and emits 'drain'.'drain' and calls resume(). Data flows again. Memory stays constant — always around 64 KB.Backpressure is the mechanism that lets a slow consumer signal a fast producer to slow down. Without it, unread data accumulates in the stream's internal buffer, consuming unbounded memory — and eventually crashing the process.
Producer (network, disk) → [internal buffer] → Consumer (disk, DB)
reads: 500 MB/s fills up! writes: 10 MB/s
Without backpressure: buffer grows until OOM crash
With backpressure: producer pauses when buffer is full, resumes on drain
// pipe() is roughly equivalent to this: readable.on('data', (chunk) => { const ok = writable.write(chunk); // write() returns false when the internal buffer is full (highWaterMark reached) if (!ok) { readable.pause(); // stop asking for more data writable.once('drain', () => { readable.resume(); // writable has space again — keep going }); } }); readable.on('end', () => writable.end()); // pipe() does all of this for you in one line: readable.pipe(writable);
// Default highWaterMark = 16 KB for byte streams const stream = fs.createReadStream('file.txt', { highWaterMark: 64 * 1024 // 64 KB chunks — larger = fewer I/O ops, more memory }); // For object streams (highWaterMark = 16 objects by default) const objectStream = new Readable({ objectMode: true, highWaterMark: 100 // buffer up to 100 objects });
"Backpressure is the feedback signal from a slow consumer to a fast producer saying 'pause until I'm ready.' pipe() implements it by checking the return value of writable.write() — when it returns false the buffer is full and pipe() pauses the readable. When the writable emits 'drain', pipe() resumes. Without this, data piles up in memory and you OOM."
OOM = Out Of Memory. Every Node.js process has a memory limit (roughly 1.5 GB on 64-bit systems by default, though this can be increased). When your code allocates more memory than that limit, Node.js does not gracefully handle it — the OS kills the process with a fatal error:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: 0xb7c9e0 node::Abort() [node]
...
Aborted (core dumped)
# Your entire server is dead. Every active request is dropped.
# Users get connection refused. Logs stop. Alerts fire.
Imagine 50 concurrent users uploading 100 MB files. Each upload reads a file and writes to a slow database. Without backpressure, each upload holds its entire 100 MB in the Writable's buffer while waiting for the DB. That is 50 × 100 MB = 5 GB of buffered data — well above the 1.5 GB limit. Server crashes, taking all 50 users' uploads with it.
With backpressure: each stream pauses when the DB is busy. Actual memory usage: 50 × 64 KB ≈ 3 MB. Server never breaks a sweat.
const src = fs.createReadStream('/dev/urandom'); // fast source const dest = fs.createWriteStream('output.bin'); // slow sink // ❌ Ignores backpressure — writable buffer grows without bound src.on('data', (chunk) => { dest.write(chunk); // return value ignored — buffer fills up! }); // Result: dest's internal buffer balloons → GC pressure → eventual OOM
src.on('data', (chunk) => { const canContinue = dest.write(chunk); // false = buffer full if (!canContinue) { src.pause(); // stop producing dest.once('drain', () => src.resume()); // resume when drained } }); src.on('end', () => dest.end());
// ✅ pipe() — handles backpressure, but errors don't propagate src.pipe(dest); // ✅ pipeline() — handles backpressure AND errors AND cleanup const { pipeline } = require('stream/promises'); await pipeline( fs.createReadStream('huge.txt'), zlib.createGzip(), fs.createWriteStream('huge.txt.gz') ); // If any stream errors: all streams destroyed, file descriptors closed
_transform and _flush work.Picture a factory with two conveyor belts — one bringing in raw parts, one carrying out finished goods. A machine in the middle takes each raw part, modifies it (drills a hole, paints it, stamps it), and places the finished piece on the outgoing belt.
That machine is a Transform stream. The incoming belt is the writable side (you write data in). The outgoing belt is the readable side (transformed data comes out). The _transform() method is the work the machine does to each piece.
write(chunk) or via pipe() from an upstream source._transform(chunk, encoding, callback) method is called. This is where you do the work: uppercase it, compress it, parse it, encrypt it.this.push(result) to put the transformed data on the readable (outgoing) side. You can push zero, one, or many chunks per input chunk.callback() to tell the stream "I'm done with this chunk — send me the next one." The stream blocks new input until you call callback._flush(callback) is called once. Push any remaining buffered output here, then call callback to end the stream._flush is called exactly once at the end — it is your last chance to push anything remaining.
A Transform stream is a Duplex where data written to the writable side is transformed and appears on the readable side. You implement _transform(chunk, encoding, callback) to define the transformation. _flush(callback) is called once when the writable side ends — your chance to push any remaining buffered output.
const { Transform } = require('stream'); class UpperCaseTransform extends Transform { _transform(chunk, encoding, callback) { // chunk is a Buffer (or string if decodeStrings: false) const upper = chunk.toString().toUpperCase(); this.push(upper); // push to readable side callback(); // signal chunk is fully processed (ready for next) // shorthand: callback(null, upper) — push + signal in one call } _flush(callback) { // Input ended — push anything remaining (e.g. buffered partial lines) this.push('\n--- END OF DOCUMENT ---\n'); callback(); } } // Usage fs.createReadStream('input.txt') .pipe(new UpperCaseTransform()) .pipe(fs.createWriteStream('output.txt'));
class CSVLineParser extends Transform { constructor() { super({ objectMode: true }); // output objects, not Buffers this._buffer = ''; this._headers = null; } _transform(chunk, _, cb) { this._buffer += chunk.toString(); const lines = this._buffer.split('\n'); this._buffer = lines.pop(); // keep incomplete last line for (const line of lines) { if (!this._headers) { this._headers = line.split(','); continue; } const vals = line.split(','); const row = Object.fromEntries(this._headers.map((h, i) => [h, vals[i]])); this.push(row); // push parsed object downstream } cb(); } _flush(cb) { if (this._buffer.trim()) this.push(/* parse last line */); cb(); } }
Both Duplex and Transform streams have a writable side (data goes in) and a readable side (data comes out). The crucial difference is whether those two sides are connected to each other.
The writable and readable sides do not know about each other. Data written in does NOT become the data that comes out. They are separate channels that happen to share one object.
Real example: a TCP socket — you write a request to the server, and the server writes a response back. Those are two separate data flows.
What you write in IS what comes out — but modified. The writable side feeds directly into the readable side through your _transform() logic.
Real example: gzip — you write plain text in, compressed bytes come out. Same data, transformed.
Duplex:
[write side] ──────────────────────── you write data in
[read side] ──────────────────────── data comes out (independently produced)
↑ Two unconnected pipes sharing one object
Transform:
[write side] → [_transform()] → [read side]
↑ Data written in flows through your logic and emerges out the other end
| Duplex | Transform | |
|---|---|---|
| Relationship | Read/write sides are independent | Written data becomes readable data (possibly modified) |
| Analogy | Telephone — you talk and listen separately | Blender — input becomes transformed output |
| Internal buffer | Two separate buffers | Two separate buffers, but data flows from write to read |
| Example | TCP socket, WebSocket | gzip, crypto cipher, serializer |
const { Duplex, Transform } = require('stream'); // Duplex — read and write sides are completely independent class EchoDuplex extends Duplex { _read(size) { // Produce data for the readable side (independently) this.push('data from readable side'); this.push(null); // end the readable side } _write(chunk, enc, cb) { // Handle incoming writes (independently of _read) console.log('received:', chunk.toString()); cb(); } } // Transform — writable input IS the readable output (transformed) class ReverseTransform extends Transform { _transform(chunk, enc, cb) { // What comes in, goes out reversed const reversed = chunk.toString().split('').reverse().join(''); cb(null, reversed); // push reversed chunk to readable side } }
Node.js runs on the V8 JavaScript engine (the same engine inside Google Chrome). V8 manages memory using a garbage collector, and by default it limits your JavaScript heap to roughly 1.5 GB on 64-bit systems. If your code tries to allocate more than that, Node.js does not gracefully degrade — the OS kills the process.
Here is the critical misconception beginners have: "I used fs.readFile which is async, so I'm safe." Wrong. fs.readFile is non-blocking, meaning it does not freeze your event loop while reading — but it still loads the entire file into a single Buffer in memory before calling your callback. Async only means "don't block the event loop." It says nothing about memory.
fs.readFile is like photocopying an entire 2000-page book before you read a single word. You need a warehouse of space for the copies before you can start.
fs.createReadStream is like reading the book normally: open to page 1, read it, move to page 2. You only ever need a desk — regardless of whether the book has 200 pages or 20,000.
fs.readFile on a 50 MB file, holds 100 × 50 MB = 5 GB in memory simultaneously. With streaming, the same 100 requests use 100 × 64 KB ≈ 6.4 MB. The difference is not academic — one crashes, one doesn't.
| Approach | RAM for a 2 GB file | When it fails |
|---|---|---|
fs.readFileSync | ~2 GB allocated at once | Any file larger than available heap |
fs.readFile (async) | ~2 GB — still all in memory | Same problem, just non-blocking |
createReadStream | ~64 KB (default highWaterMark) | Never (constant regardless of file size) |
readline interface | ~1 line at a time | Never |
const fs = require('fs'); const readline = require('readline'); // ❌ Loads entire 2 GB log file into memory const content = fs.readFileSync('huge.log', 'utf8'); const errors = content.split('\n').filter(l => l.includes('ERROR')); // ✅ Streams line by line — constant ~64 KB memory const rl = readline.createInterface({ input: fs.createReadStream('huge.log'), crlfDelay: Infinity }); const errors = []; for await (const line of rl) { if (line.includes('ERROR')) errors.push(line); } // ✅ Stream file download directly to disk without buffering in memory const { pipeline } = require('stream/promises'); const https = require('https'); https.get('https://example.com/huge-file.zip', async (response) => { await pipeline(response, fs.createWriteStream('huge-file.zip')); // HTTP response IS a Readable stream — no temp buffer needed });
stream.pipeline()? How does it differ from pipe() in error handling and resource cleanup?When your program opens a file, the operating system assigns it a small integer called a file descriptor (FD). Think of it as a locker key at a train station — the OS keeps a table of who is using which file, and the FD is the key number. Your Node process has a limited supply of these keys — typically 1024 by default on Linux/macOS.
When you are done with a file, you must close it — return the key. If your code errors out mid-stream and forgets to close the file, that FD is never returned. Each leaked FD silently occupies a slot in the OS table. This is a resource leak.
pipe(): the error is NOT forwarded to the WriteStream. FD #6 (output file) stays open. Key #6 is never returned.EMFILE: too many open files. No new files, sockets, or connections can be opened. Server is broken.pipeline(): any error destroys all streams in the chain automatically. FDs are returned. No leak. A single catch handles everything.pipe() has been in Node.js since the very beginning. It is widely used in tutorials and examples. But it has this fundamental flaw: errors on one stream do not automatically destroy the other streams. This is a long-standing design mistake that pipeline() was built to fix. Always prefer pipeline() in production code.
| pipe() | pipeline() | |
|---|---|---|
| Errors propagate | No — each stream must handle its own 'error' event | Yes — one error destroys all streams |
| Stream cleanup on error | No — streams left open (file descriptor leak) | Yes — all streams automatically destroyed |
| Promise / callback | Synchronous return (the destination stream) | Callback or Promise (via stream/promises) |
| Introduced | Original Node.js | Node.js 10 (callback), Node 15 (promise version) |
const r = fs.createReadStream('in.txt'); const z = zlib.createGzip(); const w = fs.createWriteStream('out.gz'); r.pipe(z).pipe(w); // ❌ If r errors (file not found): // - Error event emitted on r // - z and w are NOT destroyed // - w's file descriptor stays open → resource leak! // - Unhandled 'error' event crashes the process // To do it properly with pipe() you need this boilerplate: [r, z, w].forEach(s => s.on('error', err => { [r, z, w].forEach(s => s.destroy()); console.error(err); }));
const { pipeline } = require('stream/promises'); const fs = require('fs'); const zlib = require('zlib'); async function compress(input, output) { try { await pipeline( fs.createReadStream(input), zlib.createGzip(), fs.createWriteStream(output) ); console.log('Compressed successfully'); } catch (err) { console.error('Compression failed:', err.message); // All three streams are already destroyed — no cleanup needed } } // pipeline also accepts async generators as stages: await pipeline( fs.createReadStream('data.csv'), async function* (source) { for await (const chunk of source) { yield chunk.toString().toUpperCase(); // transform inline } }, fs.createWriteStream('data-upper.csv') );
"Always prefer pipeline() over pipe(). pipe() doesn't propagate errors — if any stream in the chain errors, the others are left open, leaking file descriptors. pipeline() handles it automatically: one error destroys all streams in the chain. Use the promise version from stream/promises for clean async/await syntax."
Everything in Segments 1–3 relied on one fact: Node.js never blocks its event loop on I/O. That works brilliantly for waiting on files, databases, and networks. But there is a completely different category of work — CPU-bound computation — where no I/O is involved and the event loop cannot help.
Reading a file, querying a DB, making an HTTP request. The CPU sits idle while waiting for the response. Node offloads the wait to libuv and handles thousands of these concurrently.
Image resizing, video encoding, heavy JSON parsing, cryptographic hashing, ML inference. The CPU itself is busy — 100% — with no I/O to yield on. The event loop is frozen until it finishes.
Your Node.js process is like a single chef running a busy restaurant. The chef is great at multitasking — starting a dish, putting it in the oven, then starting another while the first bakes. That's I/O-bound work: the oven waits, the chef keeps moving.
But imagine the chef stops to solve a 10,000-piece jigsaw puzzle. While solving it, the chef cannot plate food, take orders, or respond to anyone. Every customer waits. That is CPU-bound work on the main thread — one calculation blocks every other request until it finishes.
async/await fix CPU-bound work?This is one of the most common Node.js misconceptions. async/await and Promises only help with waiting — they free up the event loop while your code waits for something external (a file, a network response, a timer). They do not split computation across multiple CPU cores.
Imagine you need to add up a billion numbers. Making this function async does nothing — the CPU still has to do every single addition, one by one, on the same thread. await only yields control to the event loop at an await point. If there is no I/O to wait for, there is no yield point, and the loop stays frozen.
// ❌ This FREEZES the event loop for ~2 seconds on a modern CPU // Every other request waits. Health checks time out. Metrics drop. app.get('/hash', (req, res) => { const result = expensiveHash(req.body.data); // runs for 2s on main thread res.json({ result }); }); // ❌ Making it async does NOT fix the problem // "await" only helps if there's I/O to yield on app.get('/hash', async (req, res) => { const result = await expensiveHash(req.body.data); // still 2s on main thread! res.json({ result }); });
A Worker Thread is a completely separate OS thread with its own V8 instance and event loop. When you offload CPU work to a worker, the main thread's event loop is free to handle new requests while the worker runs the computation in parallel.
postMessage().postMessage(). Main thread receives it and responds.const { Worker } = require('worker_threads'); app.get('/hash', (req, res) => { const worker = new Worker('./hash-worker.js', { workerData: { data: req.body.data } }); worker.once('message', result => res.json({ result })); worker.once('error', err => res.status(500).json({ error: err.message })); // Main thread event loop is FREE while the worker runs });
"async/await only helps with I/O — it yields the event loop while waiting for external resources. CPU-bound work has nothing to yield on, so it freezes the loop. Worker Threads run computation on a separate OS thread with its own V8 instance, leaving the main event loop free to handle new requests."
worker_threads, child_process, and cluster? When do you use each?Node.js has three different mechanisms for doing work outside the main event loop. Beginners often confuse them because they all "run code somewhere else." The key is understanding what they share, how they communicate, and what they cost.
worker_threads: Your colleague sits at the same desk, shares your files and whiteboard, and whispers results to you. Fast, low overhead, same process memory.
child_process: You hire a contractor who works in a different office. They get their own computer, can run a totally different program (Python, Bash), and send you reports by email. Isolated, flexible, but slower to set up.
cluster: You clone yourself. Multiple copies of the exact same program run simultaneously, all answering calls on the same phone number. Each clone is independent — a crash in one doesn't affect the others.
| worker_threads | child_process | cluster | |
|---|---|---|---|
| Process | Same process | Separate process | Separate process (fork of main) |
| Memory | Shared heap possible (SharedArrayBuffer) | Completely separate | Completely separate |
| Communication | postMessage (fast) | IPC / stdin-stdout (slower) | IPC message passing |
| Language | JavaScript only | Any (Python, Ruby, shell) | JavaScript only |
| Use case | CPU-bound JS computation | Run external programs, shell commands | Scale HTTP servers across CPU cores |
| Overhead | Low (shared process) | High (spawn a process) | High (fork a process) |
When using worker_threads, you write two separate pieces of code: the main thread that creates the worker and listens for results, and the worker script that does the computation and posts results back. Both communicate via postMessage() and the 'message' event.
const { Worker, isMainThread, workerData } = require('worker_threads'); // isMainThread is true when this code runs on the main thread if (isMainThread) { const worker = new Worker(__filename, { // run THIS file as worker too workerData: { n: 42 } // data passed to the worker }); worker.on('message', (result) => { console.log('Fibonacci result:', result); // 267914296 }); worker.on('error', (err) => { console.error('Worker error:', err); }); worker.on('exit', (code) => { if (code !== 0) console.error('Worker stopped with exit code', code); }); console.log('Main thread continues — not blocked!'); } else { // This branch runs inside the worker thread const { parentPort, workerData } = require('worker_threads'); function fib(n) { return n <= 1 ? n : fib(n - 1) + fib(n - 2); // expensive! } const result = fib(workerData.n); // workerData = { n: 42 } parentPort.postMessage(result); // send result to main thread }
// Main thread sends progress requests, worker reports back const worker = new Worker('./worker.js'); worker.on('message', ({ type, data }) => { if (type === 'progress') console.log('Progress:', data + '%'); if (type === 'result') console.log('Done:', data); }); worker.postMessage({ cmd: 'start', payload: 'large dataset' }); // worker.js parentPort.on('message', ({ cmd, payload }) => { if (cmd === 'start') { for (let i = 0; i <= 100; i += 10) { doChunk(payload, i); parentPort.postMessage({ type: 'progress', data: i }); } parentPort.postMessage({ type: 'result', data: 'processed' }); } });
When you call postMessage(data), Node does not give the receiving thread a reference to the same object. It creates a full deep copy using the Structured Clone Algorithm. Both threads then hold their own independent copy. A change on one side does not affect the other.
Sending data via postMessage is like photocopying a document and handing the copy to a colleague. You both have the information, but you now have two separate documents. Writing on your copy doesn't change theirs. This is safe — but copying a 100 MB Buffer takes time and memory.
// ✅ Structured Clone handles all of these: worker.postMessage('hello'); // string worker.postMessage({ a: 1, b: [2, 3] }); // plain objects/arrays worker.postMessage(Buffer.from('data')); // Buffer (copied) worker.postMessage(new Map([['key', 'val']])); // Map, Set, Date, RegExp worker.postMessage(new Error('oops')); // Error objects // ❌ Structured Clone CANNOT handle these: worker.postMessage(() => {}); // Functions — throws DataCloneError worker.postMessage(Promise.resolve()); // Promises — throws DataCloneError worker.postMessage(myClassInstance); // Custom class methods are lost
For large binary data, copying is expensive. Transferable objects (like ArrayBuffer) can be transferred instead of copied. The original thread loses access to the data — ownership is moved to the receiving thread. This is instantaneous, regardless of size.
// ❌ Copy — 100 MB Buffer is duplicated (slow, doubles memory usage) const buf = Buffer.allocUnsafe(100 * 1024 * 1024); worker.postMessage({ buf }); // copies all 100 MB // ✅ Transfer — zero-copy, instantaneous, buf is now empty here const ab = new ArrayBuffer(100 * 1024 * 1024); worker.postMessage({ ab }, [ab]); // second arg = transferList console.log(ab.byteLength); // 0 — ownership transferred, can't use here // Transferable types: ArrayBuffer, MessagePort, ImageBitmap, OffscreenCanvas
"By default, postMessage uses the Structured Clone Algorithm to deep-copy the data — both threads have independent copies. For large binary data, you can use Transferable objects: ownership of the ArrayBuffer is moved to the receiving thread instantly with zero copying. The sending thread can no longer access the data."
SharedArrayBuffer? How does it differ from a regular ArrayBuffer sent via postMessage?A regular ArrayBuffer has exactly one owner at a time. When you postMessage it (with transfer), ownership moves to the receiving thread — the sender loses it. SharedArrayBuffer has no single owner: multiple threads can read and write to it simultaneously, with no copying at all.
A regular ArrayBuffer transferred via postMessage is like handing a document to a colleague — they now own it, you don't.
A SharedArrayBuffer is like a whiteboard in a shared office. Everyone can see it and write on it at any time. This is extremely fast — but dangerous. If two people write different things at the same moment, the result is garbled. That's why you need Atomics.
// Main thread creates the shared buffer const sharedBuffer = new SharedArrayBuffer(4); // 4 bytes = one Int32 const view = new Int32Array(sharedBuffer); const worker = new Worker('./worker.js', { workerData: { sharedBuffer } // buffer is SHARED, not copied }); view[0] = 0; // initial value worker.on('message', () => { console.log('Value written by worker:', view[0]); // sees worker's write! }); // worker.js const { workerData, parentPort } = require('worker_threads'); const view = new Int32Array(workerData.sharedBuffer); view[0] = 42; // written directly to shared memory parentPort.postMessage('done'); // notify main thread
| Method | Mechanism | Speed | Safe? |
|---|---|---|---|
postMessage (copy) | Deep clone via Structured Clone | Slow for large data | Yes — independent copies |
Transfer ArrayBuffer | Move ownership, zero-copy | Instant | Yes — one owner at a time |
SharedArrayBuffer | Both threads access same memory | Fastest (no copying ever) | Only with Atomics |
Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy) in browser environments due to Spectre vulnerability mitigations. In Node.js (no browser sandbox), it is available without restrictions.
Atomics? Why do you need them when using SharedArrayBuffer?When two threads share memory and both try to write at the same time, you get a race condition. The result depends on which thread gets there first — and that is unpredictable. Even a simple increment (value++) is actually three operations: read, add, write. Two threads doing this simultaneously can overwrite each other's work.
Thread A: reads value = 0
Thread B: reads value = 0 ← both read before either writes
Thread A: adds 1 → result = 1
Thread B: adds 1 → result = 1
Thread A: writes 1
Thread B: writes 1 ← overwrites Thread A's write!
Final value: 1 (expected: 2) ← one increment was lost!
Atomics provides operations that the CPU executes as a single, uninterruptible unit. No other thread can read or write the same memory location while an atomic operation is in progress. The read-add-write is guaranteed to complete as one step.
const sab = new SharedArrayBuffer(4); const view = new Int32Array(sab); // ❌ Not safe — two threads can race view[0]++; // ✅ Safe — atomic add: read+add+write as one uninterruptible step Atomics.add(view, 0, 1); // add 1 to index 0 Atomics.sub(view, 0, 1); // subtract 1 Atomics.store(view, 0, 99); // write 99 Atomics.load(view, 0); // read safely Atomics.compareExchange(view, 0, 99, 100); // if 99, swap to 100 // Atomics.wait — blocks the thread until value changes (like a mutex) Atomics.wait(view, 0, 0); // sleep until view[0] != 0 Atomics.notify(view, 0, 1); // wake one waiting thread
"SharedArrayBuffer gives multiple threads direct access to the same memory — no copying. But operations like value++ are three steps (read, add, write) that can interleave between threads, causing race conditions. Atomics provides operations that execute as a single indivisible CPU instruction, making shared-memory manipulation safe."
Creating a Worker Thread is expensive — Node must spin up a new V8 engine, load and parse your script, and initialise the Node environment. Doing this for every HTTP request adds ~100–200 ms overhead per request and wastes memory. The solution is a pool: create a fixed number of workers upfront, reuse them for every task, and queue tasks when all workers are busy.
Instead of hiring a new specialist for every task (expensive, slow), you hire a team of 4 specialists who are always available. When a task arrives, you hand it to any available specialist. If all 4 are busy, you queue the task and hand it to the first one who finishes. This is a thread pool.
const { Worker } = require('worker_threads'); const os = require('os'); class WorkerPool { constructor(workerFile, size = os.cpus().length) { this.queue = []; // pending tasks this.workers = []; // all worker instances this.free = []; // idle workers for (let i = 0; i < size; i++) { const worker = new Worker(workerFile); this.workers.push(worker); this.free.push(worker); } } run(data) { return new Promise((resolve, reject) => { const task = { data, resolve, reject }; if (this.free.length) { this._dispatch(this.free.pop(), task); } else { this.queue.push(task); // all busy — wait in queue } }); } _dispatch(worker, { data, resolve, reject }) { worker.once('message', (result) => { resolve(result); if (this.queue.length) { this._dispatch(worker, this.queue.shift()); // next task } else { this.free.push(worker); // return to idle pool } }); worker.once('error', reject); worker.postMessage(data); } } // Usage const pool = new WorkerPool('./hash-worker.js'); // one worker per core const result = await pool.run({ input: 'data to hash' });
piscina npm package is the de-facto production worker pool. It handles error recovery, worker crashes, task timeouts, and concurrency limits. Use it instead of rolling your own unless you have a specific reason.
An unhandled exception inside a Worker Thread does not crash the main process — it emits an 'error' event on the Worker object and then an 'exit' event with a non-zero code. If you do not listen for 'error', Node.js throws an uncaught exception on the main thread (which could crash it). Always attach an error handler.
const worker = new Worker('./worker.js', { workerData: { n: 10 } }); worker.on('message', (result) => { console.log('Result:', result); }); worker.on('error', (err) => { // Worker threw an unhandled exception console.error('Worker error:', err.message); // Spawn a replacement worker if needed }); worker.on('exit', (code) => { if (code !== 0) { console.error('Worker exited with code', code); // code 0 = clean exit, non-zero = crash or process.exit(N) } }); // Inside the worker — propagate errors explicitly: parentPort.on('message', async (data) => { try { const result = await doWork(data); parentPort.postMessage({ ok: true, result }); } catch (err) { // Don't throw — postMessage the error so main thread can handle it parentPort.postMessage({ ok: false, error: err.message }); } });
// Terminate a worker that is taking too long (timeout pattern) const timeout = setTimeout(() => { worker.terminate(); // forcefully ends the worker thread reject(new Error('Worker timed out')); }, 5000); worker.once('message', (result) => { clearTimeout(timeout); resolve(result); });
Node.js actually uses threads internally already — but they are hidden from JavaScript. libuv maintains a thread pool (default: 4 threads) for certain async operations like DNS lookups, fs.readFile, and crypto. These threads are invisible to your JavaScript — you never interact with them directly. Worker Threads are something entirely different and user-controlled.
| libuv Thread Pool | worker_threads | |
|---|---|---|
| Purpose | Offload I/O and blocking system calls for Node internals | Run JavaScript code in parallel |
| Controlled by | Node.js / libuv (automatic) | Your code |
| Runs | C/C++ code (file system, crypto) | JavaScript (your own scripts) |
| Size | 4 by default (UV_THREADPOOL_SIZE) | As many as you create |
| JavaScript accessible | No — transparent | Yes — full JS environment |
# Increase before starting Node — must be set before any I/O operations UV_THREADPOOL_SIZE=16 node server.js // Or in code (must be set VERY early, before any I/O): process.env.UV_THREADPOOL_SIZE = '16'; // Useful when: many concurrent crypto operations, many simultaneous // fs calls, or many DNS lookups — these all compete for the 4 libuv threads. // Unlike Worker Threads, increasing this does NOT help pure JS computation.
"libuv's thread pool and worker_threads are completely separate. libuv's pool (default 4 threads) is used internally by Node for blocking C-level operations like file I/O and crypto — your JS never sees these. worker_threads is your API for running JavaScript in parallel across cores. They do not share the same pool."
Worker Threads are not free. Spawning a new Worker has real overhead: starting a V8 engine, loading Node.js internals, parsing your script. For tasks that take less than the startup cost, adding a worker makes things slower.
Tasks that take >10 ms of pure CPU time: image/video processing, ML inference, cryptographic key generation, heavy data parsing (large JSON/CSV), mathematical simulations, compression.
Any I/O-bound work (file reads, DB queries, HTTP calls). Short calculations (<1 ms). Tasks where serialization cost exceeds computation time. Per-request workers without a pool.
| Cost | Rough number | Implication |
|---|---|---|
| Worker startup | ~50–150 ms | Always pool workers — never spawn per-request |
| Worker memory | ~10–30 MB per thread | Don't create more workers than CPU cores |
| postMessage serialization | ~1 ms per MB | Use Transferable for large data, SharedArrayBuffer for hot paths |
| Context switch | ~1–10 µs | Negligible unless switching thousands of times per second |
1. Is this work CPU-bound (not waiting on I/O)? → if no: async/await is enough
2. Does it take > a few milliseconds? → if no: overhead > benefit
3. Does it happen frequently enough to warrant a pool?
4. Is the data small enough that serialization cost is acceptable?
(or can I use SharedArrayBuffer / Transferable?)
5. Can I structure the task as a standalone script?
(workers can't share closures, class instances, or open DB connections)
"Worker Threads are the right tool for long-running CPU-bound JavaScript. The main pitfalls are: spawning a new worker per request (use a pool), sending large data via postMessage without Transferables (copies are slow), using workers for I/O (async handles that better), and not accounting for the ~10–30 MB memory overhead per thread. More workers than CPU cores just adds context-switching overhead without extra parallelism."
HTTP is the language your browser and server use to talk to each other. But under the hood it is just text sent over a TCP connection. Understanding this layered model is the key to understanding everything in this segment.
TCP is the postal system — it guarantees the letter arrives, in order, without damage.
TLS is sealing the letter in a tamper-evident envelope so only the recipient can read it.
HTTP is the agreed format of the letter: "Dear server, please send me /index.html. Signed, Browser."
http.createServer work internally? Walk through the full lifecycle from TCP connection to HTTP response.GET /path HTTP/1.1) and headers. This is streaming — it does not wait for the body.IncomingMessage (the request object) and a ServerResponse (the response object) and calls your requestListener(req, res).'data' event on req. You must consume it — it is a Readable stream.res.writeHead() for status + headers, then res.write()/res.end() for the body. Calling res.end() signals that the response is complete.Connection: close, the socket is destroyed. ── REQUEST (browser → server) ──────────────────────────────────
GET /users/42 HTTP/1.1\r\n
Host: api.example.com\r\n
Accept: application/json\r\n
Connection: keep-alive\r\n
\r\n ← blank line = end of headers
(no body for GET)
── RESPONSE (server → browser) ─────────────────────────────────
HTTP/1.1 200 OK\r\n
Content-Type: application/json\r\n
Content-Length: 27\r\n
\r\n
{"id":42,"name":"Alice"} ← body
const http = require('http'); const server = http.createServer((req, res) => { // req = IncomingMessage (Readable stream) // res = ServerResponse (Writable stream) console.log(req.method, req.url); // "GET /users/42" console.log(req.headers['accept']); // "application/json" console.log(req.httpVersion); // "1.1" // Route matching (what Express does under the hood) if (req.method === 'GET' && req.url === '/health') { res.writeHead(200, { 'Content-Type': 'application/json' }); res.end(JSON.stringify({ status: 'ok' })); return; } // Default 404 res.writeHead(404, { 'Content-Type': 'text/plain' }); res.end('Not Found'); }); server.listen(3000, () => console.log('Server on :3000')); // server.listen() calls the OS bind() + listen() syscalls // Node then polls for new connections via libuv
requestListener(req, res) signature. It wraps the bare req/res objects with convenience methods (req.params, res.json()), runs middleware in sequence, and provides a routing table. Under the hood it still calls http.createServer(app).
Unlike headers (which are fully parsed before your callback fires), the request body arrives in chunks after the headers. You must listen to the 'data' event, collect the chunks, and reassemble them on 'end'. Forgetting to consume the body can cause the connection to stall or memory to grow.
const http = require('http'); const { URL } = require('url'); http.createServer(async (req, res) => { // ── 1. Parse URL and query parameters ─────────────────────────── const url = new URL(req.url, 'http://localhost'); const path = url.pathname; // "/users/42" const page = url.searchParams.get('page'); // "?page=2" → "2" // ── 2. Route parameter extraction (manual regex) ───────────────── const match = path.match(/^\/users\/(\d+)$/); const userId = match?.[1]; // "42" (string — remember to parseInt) // ── 3. Read headers ────────────────────────────────────────────── const contentType = req.headers['content-type']; // always lowercase const auth = req.headers['authorization']; // ── 4. Parse the request BODY (streaming) ──────────────────────── const parseBody = (req) => new Promise((resolve, reject) => { const chunks = []; req.on('data', (chunk) => chunks.push(chunk)); req.on('end', () => resolve(Buffer.concat(chunks).toString())); req.on('error', reject); }); if (req.method === 'POST') { const rawBody = await parseBody(req); if (contentType?..includes('application/json')) { const body = JSON.parse(rawBody); // now you have your object } else if (contentType?..includes('application/x-www-form-urlencoded')) { const params = new URLSearchParams(rawBody); // form fields } } res.end('ok'); }).listen(3000);
const parseBody = (req, maxBytes = 1 * 1024 * 1024) => new Promise((resolve, reject) => { const chunks = []; let total = 0; req.on('data', (chunk) => { total += chunk.length; if (total > maxBytes) { req.destroy(); // abort the connection immediately reject(new Error('Request body too large')); return; } chunks.push(chunk); }); req.on('end', () => resolve(Buffer.concat(chunks).toString())); req.on('error', reject); });
http.get, the native fetch, and third-party clients.| Method | Available since | Pros | Cons |
|---|---|---|---|
http.request / http.get | Node.js v0.1 | No dependencies, full control | Very verbose, no JSON helpers, manual stream handling |
Native fetch | Node.js v18 (stable v21) | No deps, browser-compatible API, Promise-based | No request cancellation timeout built-in (needs AbortController) |
axios | npm package | Interceptors, auto JSON, nice API, wide adoption | Extra dependency |
got / node-fetch | npm package | Retry logic, hooks, streams support | Extra dependency |
const https = require('https'); const req = https.request({ hostname: 'api.github.com', path: '/users/torvalds', method: 'GET', headers: { 'User-Agent': 'Node.js' } }, (res) => { // res is a Readable stream of the response body console.log('Status:', res.statusCode); const chunks = []; res.on('data', chunk => chunks.push(chunk)); res.on('end', () => { const body = JSON.parse(Buffer.concat(chunks).toString()); console.log(body.name); // "Linus Torvalds" }); }); req.on('error', err => console.error(err)); req.end(); // must call end() to signal no body (even for GET)
// Native fetch — available globally in Node 18+ const getUser = async (id) => { const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), 5000); // 5s timeout try { const res = await fetch(`https://api.example.com/users/${id}`, { signal: controller.signal, headers: { 'Authorization': `Bearer ${process.env.TOKEN}` } }); if (!res.ok) throw new Error(`HTTP ${res.status}`); return await res.json(); } finally { clearTimeout(timeout); } };
http.Agent manage connection pooling and why does it matter?Every new TCP connection requires a 3-way handshake before any data can flow. For HTTPS add the TLS handshake on top. Together these cost 1–3 network round trips before your actual request even starts. If you make 100 requests to the same server and create a new connection for each one, you pay this overhead 100 times.
Opening a new TCP connection per request is like taking a taxi every time you need to go somewhere — you pay the "starting fee" every time. Keep-alive is like having a regular driver: the connection stays open between trips, so you only pay the startup cost once and then just travel.
Without keep-alive (HTTP/1.0 default):
Request 1: SYN → SYN-ACK → ACK → GET /a → 200 → FIN (connection closed)
Request 2: SYN → SYN-ACK → ACK → GET /b → 200 → FIN (new connection!)
Request 3: SYN → SYN-ACK → ACK → GET /c → 200 → FIN (new connection!)
With keep-alive (HTTP/1.1 default):
SYN → SYN-ACK → ACK
→ GET /a → 200
→ GET /b → 200 ← reusing same socket, no new handshake
→ GET /c → 200
→ FIN (one close for all three)
When you make outgoing requests with http.request(), Node uses a global http.globalAgent behind the scenes. The Agent manages a pool of open TCP sockets per host:port, reusing them for multiple requests. Key settings:
const https = require('https'); const agent = new https.Agent({ keepAlive: true, // reuse sockets (default: false in Node <19) maxSockets: 50, // max concurrent connections per host:port maxFreeSockets: 10, // max idle sockets to keep in pool timeout: 60000 // socket timeout in ms }); // Pass agent to each request: fetch('https://api.example.com/data', { agent }); // Or set globally for all https requests: https.globalAgent.keepAlive = true; // ⚠ agent: false means NO pooling — new socket per request // Use this only for one-off requests or when testing fetch('https://api.example.com/one-off', { agent: false });
maxSockets is Infinity in Node.js. Under load this means your server can open thousands of outbound connections simultaneously — exhausting OS file descriptor limits or overwhelming the target service. Always set a sensible maxSockets in production.
All data between client and server is encrypted using symmetric keys derived during the handshake. An eavesdropper on the network sees only garbled bytes.
The server's certificate proves it is who it claims to be. The client verifies the certificate was signed by a trusted Certificate Authority (CA). Prevents man-in-the-middle attacks.
const https = require('https'); const fs = require('fs'); const options = { key: fs.readFileSync('server.key'), // private key (keep secret!) cert: fs.readFileSync('server.cert'), // public certificate (share freely) // Modern TLS hardening: minVersion: 'TLSv1.2', // reject TLS 1.0 / 1.1 (deprecated, insecure) ciphers: 'TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256' // TLS 1.3 only }; https.createServer(options, (req, res) => { res.end('Secure!'); }).listen(443); // In production: use nginx/Caddy as TLS terminator in front of Node. // Node handles HTTP on port 3000; nginx handles TLS on port 443 and proxies.
// Server requires a client certificate (used for service-to-service auth) https.createServer({ key: fs.readFileSync('server.key'), cert: fs.readFileSync('server.cert'), ca: fs.readFileSync('ca.cert'), // trusted CA for clients requestCert: true, // ask for client cert rejectUnauthorized: true // reject if cert invalid }, (req, res) => { const cert = req.socket.getPeerCertificate(); console.log('Client CN:', cert.subject.CN); res.end('Hello trusted client'); }).listen(443);
http2 module in Node.js?HTTP/1.1 is serial: on one connection, you send request 1, wait for response 1, then send request 2. If response 1 is slow (large file, slow DB), everything queues behind it. Workaround: open 6–8 parallel TCP connections per host — wasteful.
HTTP headers are sent as plain text with every single request — even identical Cookie, User-Agent, and Accept headers. A typical header block is 500–2000 bytes, repeated thousands of times per session.
Authorization header 100 times, subsequent sends cost almost nothing — the receiver already has it in a shared table. HTTP/1.1 (one connection, serial):
──[GET /api/users]──[response]──[GET /api/posts]──[response]──
HTTP/2 (one connection, multiplexed streams):
Stream 1: ──[GET /api/users]─────────[response]──
Stream 2: ──[GET /api/posts]──[response]──
Stream 3: ──[GET /static/app.js]─────[response]──
All running in parallel on the same TCP connection
const http2 = require('http2'); const fs = require('fs'); // HTTP/2 requires TLS in browsers (h2c = cleartext only for testing) const server = http2.createSecureServer({ key: fs.readFileSync('server.key'), cert: fs.readFileSync('server.cert') }); server.on('stream', (stream, headers) => { // Each request is a "stream" with an ID (1, 3, 5, 7...) const path = headers[':path']; // pseudo-header (HTTP/2 specific) const method = headers[':method']; if (path === '/') { // Server push — send CSS before browser requests it stream.pushStream({ ':path': '/style.css' }, (err, push) => { if (!err) { push.respond({ ':status': 200, 'content-type': 'text/css' }); push.end('body { color: red }'); } }); } stream.respond({ ':status': 200, 'content-type': 'text/html' }); stream.end('<html>Hello HTTP/2!</html>'); }); server.listen(443);
"HTTP/1.1 suffers from head-of-line blocking — one slow response blocks the connection — and header bloat. HTTP/2 solves both: multiplexing lets multiple requests share one TCP connection concurrently, and HPACK compresses repeated headers. In Node.js, the http2 module exposes a 'stream' event instead of a request event, since each H2 stream maps to one request/response pair."
HTTP was designed for a simple pattern: client asks, server answers, connection closes (or is reused for the next ask). The server can never spontaneously send the client data — it can only respond. This makes real-time features (chat, live scores, collaborative editing) awkward to implement over plain HTTP.
Long-polling: Client sends request. Server holds it open until data is ready, then responds. Client immediately sends a new request. Like repeatedly asking "are we there yet?" — works, but wasteful.
Server-Sent Events (SSE): Client makes one HTTP request. Server keeps the response open and streams data one-way (server → client only). Like a radio broadcast — great for notifications, useless for chat.
WebSocket: Client upgrades the HTTP connection to a persistent full-duplex channel. Both sides can send messages any time, with no request-response overhead. Like a phone call.
Upgrade: websocket and a randomly generated Sec-WebSocket-Key.101 Switching Protocols. It computes a Sec-WebSocket-Accept value (SHA-1 of the key + a magic string) to prove it understands WebSocket. ── CLIENT REQUEST ──────────────────────────────────────────────
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
── SERVER RESPONSE (101 = switching protocols) ─────────────────
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
── NOW SPEAKING WEBSOCKET FRAMES (not HTTP anymore) ────────────
[frame] [frame] [frame] ... (bidirectional, any time)
| Long-polling | SSE | WebSocket | |
|---|---|---|---|
| Direction | Server → Client (one shot) | Server → Client (stream) | Full-duplex (both ways) |
| Protocol | HTTP | HTTP (text/event-stream) | WebSocket (ws://) |
| Overhead per message | Full HTTP headers each time | Tiny (after initial request) | 2–10 byte frame header |
| Browser support | All | All (except IE) | All modern browsers |
| Use case | Legacy fallback | Notifications, live feeds | Chat, gaming, collaboration |
ws library approach.const { WebSocketServer } = require('ws'); const http = require('http'); const httpServer = http.createServer((req, res) => { res.end('HTTP server running'); // handle regular HTTP requests }); // Attach WebSocket server to the same HTTP server const wss = new WebSocketServer({ server: httpServer }); wss.on('connection', (ws, req) => { const ip = req.socket.remoteAddress; console.log('New connection from', ip); ws.on('message', (data, isBinary) => { const msg = isBinary ? data : data.toString(); console.log('Received:', msg); // Broadcast to all connected clients wss.clients.forEach(client => { if (client.readyState === client.OPEN) { client.send(msg); } }); }); ws.on('close', (code, reason) => { console.log('Disconnected:', code, reason.toString()); }); ws.on('error', err => console.error('WS error:', err)); ws.send('Welcome!'); // server sends first }); httpServer.listen(3000);
TCP does not detect dead connections immediately. A client can disappear (laptop lid closed, network cut) and the server won't know for minutes. A ping/pong heartbeat actively checks that the client is still alive.
function heartbeat() { this.isAlive = true; } wss.on('connection', (ws) => { ws.isAlive = true; ws.on('pong', heartbeat); // client responds to our ping }); // Every 30 seconds, ping all clients. Kill ones that didn't pong. setInterval(() => { wss.clients.forEach(ws => { if (ws.isAlive === false) { ws.terminate(); return; } ws.isAlive = false; ws.ping(); // ws library sends WS ping frame }); }, 30_000);
net module? How do you build a raw TCP server, and when would you use it instead of HTTP?The net module is Node's interface to raw TCP sockets. http is built on top of net — an HTTP server is just a TCP server that understands the HTTP text format. Using net directly gives you a raw byte stream with no protocol overhead. You design the protocol yourself.
You are implementing a custom binary protocol. Building a database driver, game server, message broker (like Redis), proxy, or IoT gateway. Any case where HTTP's text overhead is too much.
You are building a REST API, a web server, a webhook receiver, or anything that talks to browsers or standard HTTP clients. HTTP gives you routing, headers, status codes, and caching for free.
const net = require('net'); // ── TCP Server ─────────────────────────────────────────────────── const server = net.createServer((socket) => { // socket is a Duplex stream — you can read and write bytes console.log('Client connected:', socket.remoteAddress); socket.on('data', (data) => { console.log('Received:', data.toString()); // raw bytes socket.write('Echo: ' + data); // write back }); socket.on('end', () => console.log('Client disconnected')); socket.on('error', (err) => console.error('Socket error:', err)); }); server.listen(8080, () => console.log('TCP server on :8080')); // ── TCP Client ─────────────────────────────────────────────────── const client = net.connect({ port: 8080 }, () => { client.write('Hello server!'); }); client.on('data', data => console.log(data.toString())); // "Echo: Hello server!"
// Server: listen on a file path instead of a port net.createServer(handler).listen('/tmp/myapp.sock'); // Client: connect via file path net.connect('/tmp/myapp.sock'); // Used by: nginx ↔ Node.js, PM2 process manager, Redis on localhost // ~30% faster than loopback TCP (127.0.0.1) for local IPC
ECONNRESET / ECONNREFUSED mean?When you deploy a new version, your process manager (PM2, Kubernetes) sends SIGTERM. Calling process.exit() immediately kills all in-flight requests mid-response — users see broken pages. Graceful shutdown stops accepting new connections but lets existing ones finish.
SIGTERM. Stop accepting new connections: call server.close().server.close(callback) fires when the last connection closes.process.exit(0).setTimeout(() => process.exit(1), 30000).const server = http.createServer(app).listen(3000); async function shutdown(signal) { console.log(`${signal} received — graceful shutdown`); await new Promise(resolve => server.close(resolve)); // stop new connections await db.close(); // close DB pool await redisClient.quit(); // close Redis connection console.log('Shutdown complete'); process.exit(0); } process.on('SIGTERM', () => shutdown('SIGTERM')); process.on('SIGINT', () => shutdown('SIGINT')); // Ctrl+C // Safety net — force exit if graceful shutdown hangs setTimeout(() => { console.error('Forced exit'); process.exit(1); }, 30_000).unref();
const server = http.createServer(handler); // 1. Keep-alive timeout — how long to hold an idle persistent connection server.keepAliveTimeout = 65_000; // must be > load balancer's idle timeout // 2. Headers timeout — how long to wait for request headers to arrive server.headersTimeout = 60_000; // 3. Request timeout — total time allowed for the full request cycle server.requestTimeout = 300_000; // Node 14+ // 4. Socket timeout — per-socket idle timeout (no data for N ms) server.setTimeout(120_000, (socket) => { socket.destroy(); // socket went idle — destroy it });
| Error code | Meaning | Common cause | Fix |
|---|---|---|---|
ECONNREFUSED | Nothing listening on that port | Server not started, wrong port, DB down | Check the target service is running |
ECONNRESET | Remote end forcibly closed the connection | Load balancer timeout, server crash, firewall | Retry with backoff; check keepAliveTimeout |
ETIMEDOUT | No response within timeout | Slow network, overloaded server, firewall drop | Set a request timeout; add circuit breaker |
ENOTFOUND | DNS lookup failed | Wrong hostname, DNS misconfiguration | Verify hostname; check /etc/resolv.conf |
EMFILE | Too many open file descriptors | Connection pool not limiting sockets | Set maxSockets on Agent; increase OS fd limit |
"Graceful shutdown means stopping new connections via server.close(), waiting for in-flight requests to finish, closing DB/Redis connections, then calling process.exit(0) — with a hard timeout fallback. ECONNRESET means the remote side forcibly closed the socket — usually a load balancer idle timeout. Fix it by setting server.keepAliveTimeout slightly above the load balancer's value. ECONNREFUSED means nothing is listening on that port."
Never optimise what you have not measured. Most developers instinctively "feel" what is slow and optimise the wrong thing. Studies consistently show that programmers guess the bottleneck correctly less than 20% of the time. This segment teaches you to measure first, then fix with precision.
Latency = how long one request takes (milliseconds). Throughput = how many requests per second the system handles. Optimising one can hurt the other. A batch that waits to group 100 requests improves throughput but increases individual latency.
1. Code — algorithms, V8 optimisations, avoid GC pressure.
2. Process — event loop health, memory usage, concurrency limits.
3. Architecture — caching, clustering, load balancing, CDN.
You wouldn't operate on a patient without first diagnosing the problem. Profiling is the diagnosis. A flame graph is the X-ray. Only after you see exactly where time is being spent do you start fixing things. Guessing and optimising blindly is like operating on the wrong organ.
--inspect, Chrome DevTools, and clinic.js.| Tool | Best for | How to start |
|---|---|---|
--inspect + Chrome DevTools | Deep CPU + memory analysis on any Node app | node --inspect server.js → open chrome://inspect |
clinic.js (clinic doctor) | Automated diagnosis — tells you what kind of problem you have | npx clinic doctor -- node server.js |
0x (flame graphs) | Identifying hot functions — which code path eats CPU | npx 0x server.js |
node --inspect server.js. Node opens a WebSocket debugger on port 9229 and prints the inspector URL.chrome://inspect. Click "inspect" under your Node process. The DevTools window opens connected to your live process.autocannon or your browser). Click Stop.clinic.js wraps your app, runs it under load, collects data, and then generates an HTML report with a diagnosis: "CPU bottleneck", "Event loop blocked", "I/O bottleneck", or "Memory leak". It is the fastest way to understand what kind of problem you have before going deeper.
# Install once npm install -g clinic # Run your server under clinic's harness clinic doctor -- node server.js # In another terminal, generate load npx autocannon -c 100 -d 30 http://localhost:3000 # When done, Ctrl+C — clinic opens HTML report automatically # It shows: event loop delay, CPU usage, memory over time # For detailed flame graphs: clinic flame -- node server.js # For async/await timing issues: clinic bubbleprof -- node server.js
node --inspect=0.0.0.0:9229 with an SSH tunnel to profile a live production server without restarting it. Never expose the debug port publicly — it gives full remote code execution access.
A profiler samples the call stack thousands of times per second. Each sample is one snapshot of "what functions are currently running, and which functions called them." A flame graph stacks all these samples and merges identical sequences.
┌─────────────────────────────────────────────────────────┐
│ JSON.parse [████████████████] ← WIDE = SLOW │
│ parseBody [████████████████████] │
│ routeHandler [██████████████████████████] │
│ http.Server.emit [████████████████████████████████] │
└─────────────────────────────────────────────────────────┘
X-axis → NOT time order — it is sorted alphabetically
(so adjacent bars are NOT related in time)
Y-axis → call stack depth (bottom = root, top = leaf)
Width → proportion of CPU time spent in that function
(wider = more samples = more CPU consumed)
🔴 Hot path: wide bars near the TOP of the stack
These are the actual bottlenecks — the functions doing real work.
🟡 Wide bars in the MIDDLE: a function that calls slow children
(fix the children, not the parent)
🟣 "Plateau" bars: a function running flat across the graph
= blocking the event loop — nothing else ran during that time
JSON.stringify / JSON.parse → large objects, run in worker or use faster-json
RegExp → catastrophic backtracking — rewrite regex or use re2
crypto.pbkdf2Sync → blocking crypto — use async version or worker
fs.readFileSync → blocking I/O — replace with createReadStream
Array.sort on large arrays → move to worker or optimise comparator
Deep object cloning → avoid lodash.cloneDeep on hot paths
V8 does not interpret JavaScript line by line. It compiles it — but smartly, in two phases:
In a statically-typed language like C++, the compiler knows the exact memory layout of every struct at compile time — accessing a field is a single memory read at a fixed offset. JavaScript is dynamic — objects can have any properties added at any time. V8 solves this with hidden classes.
When students all sit in the same seats every day, the teacher can say "row 2, seat 3" to find Alice instantly. That's a hidden class — a known layout. If students sit randomly every day, the teacher has to search the whole room. That's a dictionary-style property lookup — much slower.
// ✅ GOOD — both objects have same property order → share hidden class C2 // V8 can use fast fixed-offset property access (like C struct) const a = { x: 1, y: 2 }; // hidden class: C0 → C1 (add x) → C2 (add y) const b = { x: 3, y: 4 }; // same shape → shares C2 → fast! // ❌ BAD — different property order → different hidden classes → slower const c = { x: 1, y: 2 }; // hidden class C2 const d = { y: 4, x: 3 }; // different class C4 — two separate layouts! // ❌ BAD — adding properties after creation creates new hidden classes const e = {}; e.x = 1; // C0 → C1 e.y = 2; // C1 → C2 (transition OK if always in this order) // ❌ WORST — deleting properties destroys hidden class sharing delete e.x; // forces dictionary mode — all property access becomes slow
Initialise all object properties in the constructor. Keep property order consistent. Use typed arrays for numeric data. Avoid mixing types in arrays ([1, 'two', 3]). Use factory functions that always produce the same shape.
Adding properties to objects after creation in different orders. Using delete on object properties. Storing different types in the same variable across calls. Arrays with holes ([1,,3]). Megamorphic call sites (calling same function with 4+ different object shapes).
TurboFan compiles hot functions to machine code based on assumptions about the types of values seen so far. If those assumptions are violated, V8 has to discard the native code and revert to the slower Ignition interpreter. This is a deoptimisation. It can cause a sudden spike in latency on an otherwise fast path.
Imagine a factory line optimised to make blue widgets. It runs fast because every machine is tuned for blue widgets. Suddenly you send a red widget through. The line has to stop, reconfigure for red, process it slowly, then decide whether to retool back. That retooling cost is deoptimisation.
// 1. Type change mid-function function add(a, b) { return a + b; } add(1, 2); // V8 optimises assuming a,b are always Numbers add('hi', 'yo'); // ❌ type changed → DEOPT → back to interpreter // 2. try/catch around hot code (historically; mostly fixed in modern V8) function hotPath() { try { /* hot work */ } catch(e) {} // can prevent optimisation } // 3. arguments object (old pattern — use rest params instead) function old() { return arguments[0]; } // ❌ arguments prevents opt function modern(...args) { return args[0]; } // ✅ rest params are fine // 4. for...in on objects with prototype properties (use Object.keys instead) // 5. eval() — prevents all optimisation of the enclosing function // 6. with statement — same issue
# Print every deopt to stdout with reason and source location node --trace-deopt server.js # Example output: # [deoptimize] reason: wrong type at [doWork] /app/server.js:42 # [deoptimize] reason: lost precision at [parseAmount] /app/parser.js:17 # See all V8 optimisation decisions: node --trace-opt --trace-deopt server.js 2>&1 | grep -E "(optimiz|deoptim)" # Or use the --allow-natives-syntax flag + %OptimizationStatus() in tests node --allow-natives-syntax -e " function add(a,b){ return a+b; } add(1,2); add(1,2); %OptimizeFunctionOnNextCall(add); add(1,2); console.log(%GetOptimizationStatus(add)); // 2 = optimized "
┌─────────────────────────────────────────────────────────────┐
│ V8 HEAP │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ Young Gen │ │ Old Gen │ │ Large Objects │ │
│ │ (Nursery) │ │ (Tenured) │ │ Space │ │
│ │ │ │ │ │ │ │
│ │ New objects │ │ Survived 2+ │ │ Objects > 512 KB │ │
│ │ live here │ │ minor GCs │ │ live here │ │
│ │ ~1–8 MB │ │ ~1.4 GB max │ │ allocated once │ │
│ │ GC: fast │ │ GC: slow │ │ GC: full heap │ │
│ └──────────────┘ └──────────────┘ └───────────────────┘ │
│ │
│ OUTSIDE HEAP (not managed by V8 GC): │
│ Buffer data, ArrayBuffer data — allocated via C++ malloc │
└─────────────────────────────────────────────────────────────┘
Runs very frequently (every few hundred KB). Copies surviving objects to the other half of the young space. Objects that survive two minor GCs are "promoted" to Old Gen. Very fast (~1 ms). Most objects die young — this is the "generational hypothesis."
Runs infrequently but can take 50–200 ms. Marks all reachable objects from roots (global, stack). Sweeps unreachable objects. Optionally compacts to reduce fragmentation. This is the GC pause you feel in latency spikes.
// 1. Using the perf_hooks PerformanceObserver const { PerformanceObserver } = require('perf_hooks'); const obs = new PerformanceObserver((list) => { for (const entry of list.getEntries()) { console.log('GC type:', entry.detail.kind, 'duration:', entry.duration.toFixed(2), 'ms'); } }); obs.observe({ type: 'gc' }); // 2. Heap statistics via process.memoryUsage() const mem = process.memoryUsage(); console.log({ heapUsed: (mem.heapUsed / 1024 / 1024).toFixed(1) + ' MB', // JS objects heapTotal: (mem.heapTotal / 1024 / 1024).toFixed(1) + ' MB', // V8 heap capacity external: (mem.external / 1024 / 1024).toFixed(1) + ' MB', // Buffer / C++ memory rss: (mem.rss / 1024 / 1024).toFixed(1) + ' MB' // total process RSS }); // 3. Expose GC logs at runtime node --expose-gc server.js // enables global.gc() to trigger GC manually node --trace-gc server.js // prints every GC event with type + duration
A memory leak happens when your code holds references to objects that are no longer needed, preventing the GC from collecting them. Leaked objects accumulate in Old Gen. Eventually heap usage reaches the V8 limit (~1.5 GB) and the process crashes — or GC pauses become so frequent that response times degrade badly.
Memory leaks are insidious: the server starts fast, runs fine for hours, then slows down and eventually dies. The only fix is a restart — until you find and fix the root cause.
// 1. Unbounded caches / collections (most common leak) const cache = {}; // grows forever — never evicts app.get('/user/:id', (req, res) => { cache[req.params.id] = fetchUser(req.params.id); // every userId ever hit → leak }); // Fix: use a Map with a max size (LRU cache) or WeakMap for object keys // 2. Event emitter listener accumulation function addHandler() { emitter.on('data', (d) => process(d)); // adds a new listener on every call! } // Fix: use .once() or remove listener in cleanup; check emitter.listenerCount() // 3. Closure retaining large objects function buildHandler(largeData) { // 50 MB object return (req, res) => { const id = largeData.id; // closure keeps largeData alive in memory res.send(id); }; } // Fix: only close over what you need — const id = largeData.id; (then largeData can GC) // 4. Circular references with external resources class Connection { constructor() { this.socket = createSocket(); this.socket.conn = this; // circular — neither can GC } }
node --inspect server.js. Open Chrome DevTools → Memory tab.global.gc() with --expose-gc). Take a second snapshot.WeakMap when you need to associate extra data with an object but don't want to prevent it from being GC'd. If the key object is collected, the WeakMap entry disappears automatically. WeakRef holds a reference that does not prevent GC — call deref() and check for undefined before using.
You schedule a callback with setTimeout(fn, 0) — meaning "run this as soon as possible, in the next tick." But the event loop is occupied processing a long-running task. Your callback waits. The actual delay between when you scheduled it and when it ran is the event loop lag.
The event loop is a single cashier. Each customer is a callback. If one customer has a cart of 500 items (a long synchronous operation), everyone behind them waits the full duration. The waiting time is event loop lag. A 200 ms synchronous operation causes 200 ms of lag for every other request waiting in the queue.
// Simple measurement using setTimeout baseline function measureLag() { const start = Date.now(); setTimeout(() => { const lag = Date.now() - start; // should be ~0, anything > 50ms is a problem if (lag > 50) console.warn('Event loop lag:', lag, 'ms'); measureLag(); // schedule next measurement }, 0); } measureLag(); // Production-grade: use Node.js built-in monitorEventLoopDelay (Node 11+) const { monitorEventLoopDelay } = require('perf_hooks'); const h = monitorEventLoopDelay({ resolution: 20 }); // sample every 20ms h.enable(); setInterval(() => { console.log({ mean: (h.mean / 1e6).toFixed(2), // nanoseconds → ms p99: (h.percentile(99) / 1e6).toFixed(2), max: (h.max / 1e6).toFixed(2) }); h.reset(); }, 5000);
| Cause | Symptom | Fix |
|---|---|---|
| Synchronous CPU-bound work | All requests delayed during computation | Move to Worker Thread or child_process |
| Large JSON.parse / JSON.stringify | Periodic lag spikes on large payloads | Use streaming JSON parser (stream-json), or do in worker |
| GC major collection pause | Periodic spikes every few seconds/minutes | Reduce allocation rate; tune heap size; use object pools |
| Synchronous crypto (pbkdf2Sync, scryptSync) | Lag during login/auth requests | Always use async versions: crypto.pbkdf2() |
| Blocking fs calls (readFileSync) | Lag during file access | Replace with async streams or fs.promises |
| Dense in-memory computation in hot path | Constant baseline lag | Profile with flame graph, optimise algorithm, or use worker |
cluster module? How does it scale across CPU cores and how does it differ from Worker Threads?Node.js is single-threaded — one process uses at most one CPU core. A 16-core server runs one Node process: 15 cores sit idle. The cluster module forks multiple copies of your server process, one per CPU core, all sharing the same port. The OS distributes incoming connections across workers.
A single Node process is a bank with one teller. Cluster is the same bank with 16 tellers all serving customers from the same queue. Each teller is an independent copy of your entire application — they don't share memory, but they all answer the same phone number (port).
cluster.fork() once per CPU core. Each fork is a full child process running your server code.'exit' event and forks a replacement automatically.const cluster = require('cluster'); const http = require('http'); const os = require('os'); const numCPUs = os.cpus().length; if (cluster.isPrimary) { console.log(`Primary ${process.pid} forking ${numCPUs} workers`); for (let i = 0; i < numCPUs; i++) cluster.fork(); cluster.on('exit', (worker, code) => { console.warn(`Worker ${worker.process.pid} died (code ${code}) — respawning`); cluster.fork(); // auto-restart crashed workers }); } else { // Each worker runs its own HTTP server on the same port http.createServer((req, res) => { res.end(`Worker ${process.pid} handled this`); }).listen(3000, () => { console.log(`Worker ${process.pid} listening`); }); }
| cluster | worker_threads | |
|---|---|---|
| Unit of work | Full HTTP server (I/O-bound scaling) | A specific CPU-bound task |
| Memory | Separate process — no shared memory | Same process — can share memory |
| Communication | IPC messages between processes | postMessage (fast) |
| Crash isolation | Full isolation — one crash doesn't kill others | Worker crash emits event on main thread |
| Use case | Maximise HTTP throughput on multi-core servers | Offload one expensive computation |
In practice: Use PM2 (pm2 start app.js -i max) instead of writing cluster code manually. PM2 manages forking, restart on crash, zero-downtime reload, and metrics out of the box.
// Node's built-in LRU via a bounded Map (Map preserves insertion order) class LRUCache { constructor(maxSize = 500) { this.cache = new Map(); this.maxSize = maxSize; } get(key) { if (!this.cache.has(key)) return undefined; const val = this.cache.get(key); this.cache.delete(key); this.cache.set(key, val); // move to end (MRU) return val; } set(key, val) { if (this.cache.has(key)) this.cache.delete(key); else if (this.cache.size >= this.maxSize) this.cache.delete(this.cache.keys().next().value); // evict LRU (first entry) this.cache.set(key, val); } }
// 1. Cache-Control: max-age — browser caches for N seconds, no server hit res.setHeader('Cache-Control', 'public, max-age=3600'); // 1 hour // 2. ETag — fingerprint of the response; conditional GET saves bandwidth const etag = crypto(body); res.setHeader('ETag', etag); if (req.headers['if-none-match'] === etag) { res.writeHead(304); res.end(); return; // Not Modified — no body sent! } // 3. Stale-While-Revalidate — serve stale immediately, refresh in background res.setHeader('Cache-Control', 'public, max-age=60, stale-while-revalidate=300'); // Browser serves cached version instantly; background request updates cache // 4. Cache-Control: private — per-user data, only browser caches (not CDN) res.setHeader('Cache-Control', 'private, max-age=0, must-revalidate');
| Metric | What it measures | Good target (typical API) |
|---|---|---|
| Throughput (RPS) | Requests per second at saturation | Depends on workload — measure your baseline |
| p50 latency | Median response time — what most users experience | < 20 ms for simple API |
| p99 latency | Worst 1% of requests — tail latency | < 200 ms — often 10–50× higher than p50 |
| p99.9 latency | Worst 0.1% — the "long tail" | Watch GC pauses here |
| Error rate | % of non-2xx responses under load | 0% — any errors under load indicate a bug |
| Memory growth | Heap used after N minutes of load | Stable (flat line) — any growth = potential leak |
# Install npm install -g autocannon # Basic benchmark: 100 connections, 30 seconds autocannon -c 100 -d 30 http://localhost:3000/api/users # Output shows: # Stat | 2.5% | 50% | 97.5% | 99% | Avg | Stdev | Max # Latency | 4 ms | 7 ms | 18 ms | 45ms | 8.2 ms | 3.1 | 234ms # Req/Sec | 9200 |10500 | 11400 | ... | 10213.5 | ... # POST with JSON body autocannon -c 50 -d 20 \ -m POST \ -H 'Content-Type: application/json' \ -b '{"name":"test"}' \ http://localhost:3000/api/users # Programmatic use in Node const autocannon = require('autocannon'); const result = await autocannon.run({ url: 'http://localhost:3000', connections: 100 }); console.log(autocannon.printResult(result));
-w 5 (warmup connections) in autocannon.NODE_ENV=production."My approach: measure first with autocannon to establish a baseline and understand if the problem is latency or throughput. Then run clinic doctor to identify the category — CPU, event loop, or memory. If CPU: use 0x for a flame graph to find the hot function. If memory: use heap snapshots in DevTools to find what's growing. If event loop lag: monitorEventLoopDelay + look for synchronous blocking. Only then do I change code — and always re-benchmark after to verify the fix actually helped."
To defend a system you must understand how it is attacked. Almost every vulnerability in web applications comes down to one root cause: trusting data that comes from outside your process. User input, HTTP headers, environment variables, database responses, third-party package code — any of these can be malicious. Your job is to treat every external input as potentially hostile until you have validated it.
A good bank teller does not hand over money just because someone claims to be the account owner. They verify ID, check the signature, confirm the amount is available, log the transaction, and alert the manager if anything is unusual. Every step is a security control. Good security code works the same way: verify, validate, sanitise, log, and limit.
Injection happens when you mix untrusted data with a command or query in a way that the data can change the structure of that command. Instead of being treated as a value, the attacker's input is interpreted as code. The root cause is always the same: string concatenation where parameterisation should be used.
Parameterised queries are like a structured form: there is a "name" box and a "surname" box. The form processor knows exactly which part is data. String concatenation is like telling someone what to write: "Write: hello, [whatever the user typed]." If the user types "; DROP TABLE users;--" the person writes that verbatim — and your database executes it.
// ❌ VULNERABLE — string concatenation builds the query // Attacker sends: username = "admin'--" // Query becomes: SELECT * FROM users WHERE name='admin'--' AND pass='x' // The -- comments out the password check → login bypassed! const query = `SELECT * FROM users WHERE name='${req.body.username}' AND pass='${req.body.password}'`; await db.query(query); // ✅ FIXED — parameterised query: data is NEVER part of the SQL string const row = await db.query( 'SELECT * FROM users WHERE name = $1 AND pass = $2', [req.body.username, req.body.password] // values sent separately, never interpolated ); // The DB driver sends query structure and values in separate packets. // The DB can NEVER interpret a value as SQL syntax.
// ❌ VULNERABLE — attacker sends JSON body: { "username": {"$gt": ""} } // req.body.username is now the object { $gt: "" } // MongoDB query becomes: { username: { $gt: "" } } → matches ALL users! const user = await User.findOne({ username: req.body.username, // object passed directly → operator injection password: req.body.password }); // ✅ FIXED — enforce string type before passing to query const sanitise = (val) => typeof val === 'string' ? val : ''; const user = await User.findOne({ username: sanitise(req.body.username), password: sanitise(req.body.password) }); // Better: use a validation library like Zod or Joi to enforce schema at the boundary
const { exec } = require('child_process'); // ❌ VULNERABLE — attacker sends: filename = "report.pdf; cat /etc/passwd" // Shell executes: convert report.pdf; cat /etc/passwd → reads /etc/passwd! exec(`convert ${req.query.filename} output.png`, (err, out) => { ... }); // ✅ FIX 1 — use execFile instead of exec (no shell, args are separate) const { execFile } = require('child_process'); execFile('convert', [req.query.filename, 'output.png'], ...); // execFile does not invoke a shell — the filename cannot inject commands // ✅ FIX 2 — validate input strictly (allowlist, not blocklist) if (!/^[a-zA-Z0-9_\-]+\.pdf$/.test(req.query.filename)) { return res.status(400).send('Invalid filename'); }
Every plain JavaScript object inherits from Object.prototype. If an attacker can set a property on Object.prototype, that property will appear on every object in your application — because all objects inherit from it. This can change control flow, bypass security checks, or enable remote code execution.
Imagine every object in your program drinks from a shared water supply (Object.prototype). If an attacker adds a poison (a property) to the supply, every object that drinks from it gets affected — even objects created after the poisoning. That is prototype pollution.
// How pollution happens — a naive "deep merge" function function merge(target, source) { for (const key of Object.keys(source)) { if (typeof source[key] === 'object') { target[key] = merge(target[key] || {}, source[key]); } else { target[key] = source[key]; // ❌ allows writing to __proto__! } } return target; } // Attacker sends: { "__proto__": { "isAdmin": true } } // After merge: Object.prototype.isAdmin === true merge({}, JSON.parse('{"__proto__": {"isAdmin": true}}')); // Now EVERY object inherits isAdmin: true const user = { name: 'alice' }; // no isAdmin property console.log(user.isAdmin); // true ← inherited from polluted prototype! // If your auth check is: if (user.isAdmin) grantAdminAccess(); // → every user is now an admin
// 1. Block dangerous keys in merge/deep-clone functions const BLOCKED = new Set(['__proto__', 'constructor', 'prototype']); function safeMerge(target, source) { for (const key of Object.keys(source)) { if (BLOCKED.has(key)) continue; // skip dangerous keys target[key] = source[key]; } return target; } // 2. Use null-prototype objects for untrusted data (no prototype chain) const safe = Object.create(null); // no __proto__, no prototype chain safe['__proto__'] = 'attack'; // just sets a normal property, harmless // 3. Freeze Object.prototype (prevents any additions) Object.freeze(Object.prototype); // throws in strict mode if polluted // 4. Use structured validation (Zod/Joi) to only allow known keys // Unknown keys are stripped before they ever reach merge logic const schema = z.object({ name: z.string(), age: z.number() }).strict(); // .strict() rejects any extra keys not in the schema
npm audit — many advisories are specifically for prototype pollution in deep-merge utilities.
bcrypt / argon2 used instead of SHA-256 or MD5?SHA-256 and MD5 are designed to be fast — that is their purpose for checksums and signatures. But for passwords, speed is the enemy. A modern GPU can compute 10 billion SHA-256 hashes per second. If an attacker steals your hashed password database, they can try every possible 8-character password in seconds. A fast hash provides almost no protection.
A fast hash (SHA-256) is a screen door. It looks like a door, but an attacker can get through it in moments. bcrypt is a bank vault door — it is designed to be slow (deliberately). Even if an attacker gets the hash, cracking it takes years instead of seconds. The slowness is the feature.
Algorithm | Time to crack (GPU farm) | Why
─────────────────────────────────────────────────────────────
MD5 | < 1 second | 50 billion hashes/sec
SHA-256 | ~1 second | 10 billion hashes/sec
bcrypt(10) | ~10-100 years | ~200 hashes/sec (cost factor 10)
argon2id | > 100 years | Tunable; also memory-hard
─────────────────────────────────────────────────────────────
"memory-hard" = requires large RAM → can't parallelise on GPU cheaply
bcrypt has two key properties: a cost factor (work factor) that doubles the computation time for every increment, and a built-in random salt that is stored alongside the hash. The salt ensures that two users with the same password produce completely different hashes, defeating rainbow table attacks.
const bcrypt = require('bcrypt'); // REGISTRATION — hash the password before storing const register = async (plainPassword) => { const saltRounds = 12; // 2^12 iterations — ~250ms on modern CPU const hash = await bcrypt.hash(plainPassword, saltRounds); // Store 'hash' in DB. NEVER store the plaintext password. await db.save({ passwordHash: hash }); }; // LOGIN — compare against stored hash const login = async (plainPassword, storedHash) => { const match = await bcrypt.compare(plainPassword, storedHash); // bcrypt.compare is timing-safe — no timing oracle vulnerability if (!match) throw new Error('Invalid credentials'); }; // argon2 — recommended for new projects (winner of Password Hashing Competition) const argon2 = require('argon2'); const hash = await argon2.hash(password, { type: argon2.argon2id }); const valid = await argon2.verify(hash, password);
A JSON Web Token is a self-contained credential. The server signs it with a secret key, so it can verify later that it created the token — without needing to look anything up in a database. A JWT has three base64-encoded parts separated by dots:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9 ← Header (algorithm + type)
.
eyJ1c2VySWQiOiI0MiIsInJvbGUiOiJ1c2VyIn0 ← Payload (your claims — NOT encrypted!)
.
SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c ← Signature (HMAC or RSA of header+payload)
Header: { "alg": "HS256", "typ": "JWT" }
Payload: { "userId": "42", "role": "user", "exp": 1700000000 }
Signature: HMAC-SHA256(base64(header) + "." + base64(payload), SECRET_KEY)
⚠ The payload is base64-encoded, NOT encrypted.
Anyone can decode it. NEVER put passwords or sensitive data in a JWT.
const jwt = require('jsonwebtoken'); // ─── Vulnerability 1: Algorithm "none" attack ─────────────────────────────── // Attacker modifies header: { "alg": "none" } → signs with empty signature // If your code trusts 'alg' from the token header → anyone can forge a token! // ❌ VULNERABLE — accepts algorithm from the token jwt.verify(token, secret); // old versions trusted header's alg field // ✅ FIXED — explicitly specify the allowed algorithm jwt.verify(token, secret, { algorithms: ['HS256'] }); // ─── Vulnerability 2: Weak/missing secret ─────────────────────────────────── // ❌ VULNERABLE — short or predictable secret is brute-forceable jwt.sign(payload, 'secret'); // brute-forced in milliseconds jwt.sign(payload, 'mysecret123'); // still trivially weak // ✅ FIXED — 256+ bit cryptographically random secret // Generate once: node -e "console.log(require('crypto').randomBytes(64).toString('hex'))" jwt.sign(payload, process.env.JWT_SECRET); // 128-hex-char secret from env // ─── Vulnerability 3: No expiry ───────────────────────────────────────────── // ❌ VULNERABLE — token never expires; stolen token works forever jwt.sign({ userId: 42 }, secret); // ✅ FIXED — always set expiry jwt.sign({ userId: 42 }, secret, { expiresIn: '15m' }); // short-lived access token // Pair with a long-lived refresh token stored in httpOnly cookie // ─── Vulnerability 4: JWT in localStorage ─────────────────────────────────── // ❌ VULNERABLE — XSS can steal localStorage tokens // ✅ FIXED — store in httpOnly, secure, sameSite cookie // httpOnly = JS cannot read it → XSS cannot steal it
"JWTs have three common attack vectors: the 'alg: none' attack (always specify allowed algorithms explicitly), weak secrets (use 256+ bit random), and missing expiry (always set expiresIn and use refresh tokens). Additionally, store JWTs in httpOnly cookies — not localStorage — so XSS cannot steal them. The payload is only base64, not encrypted, so never put sensitive data there."
HTTP security headers are instructions from your server to the browser: "here is how you are allowed to use this page." The browser enforces these rules, providing a second layer of defence even if your application code has vulnerabilities. They cost nothing and should be set on every response.
| Header | Protects against | Recommended value |
|---|---|---|
Content-Security-Policy | XSS — restricts which scripts, styles, and resources can load | default-src 'self'; script-src 'self' |
Strict-Transport-Security | HTTPS downgrade / SSL stripping attacks | max-age=31536000; includeSubDomains |
X-Content-Type-Options | MIME sniffing — browser misinterpreting file type | nosniff |
X-Frame-Options | Clickjacking — embedding your page in a malicious iframe | DENY or SAMEORIGIN |
Referrer-Policy | Leaking sensitive URLs in the Referer header to third parties | no-referrer-when-downgrade |
Permissions-Policy | Restricts browser features (camera, microphone, geolocation) | camera=(), microphone=(), geolocation=() |
X-Powered-By | Fingerprinting — remove it to hide that you use Express/Node | Remove entirely (app.disable('x-powered-by')) |
const express = require('express'); const helmet = require('helmet'); const app = express(); // Sets ~12 security headers automatically (use this on every Express app) app.use(helmet()); // Customise CSP for your specific needs app.use(helmet.contentSecurityPolicy({ directives: { defaultSrc: ["'self'"], scriptSrc: ["'self'", 'https://cdn.example.com'], styleSrc: ["'self'", "'unsafe-inline'"], // avoid unsafe-inline in prod imgSrc: ["'self'", 'data:', 'https:'], connectSrc: ["'self'", 'https://api.example.com'], upgradeInsecureRequests: [] } })); // HSTS — tell browsers to ALWAYS use HTTPS for this domain (1 year) app.use(helmet.strictTransportSecurity({ maxAge: 31_536_000, includeSubDomains: true, preload: true }));
SameSite cookie attribute play?CSRF exploits the fact that browsers automatically include cookies with every request to a domain — even if the request originated from a different website. If a user is logged into your site (session cookie stored in browser), and visits a malicious site, that site can make the user's browser silently send a request to your API — with the user's real cookies attached.
bank.example.com/transfer?amount=1000&to=attacker.// ── Layer 1: SameSite cookie attribute (modern, most effective) ────────────── // SameSite=Strict: cookie NOT sent for ANY cross-site request (breaks OAuth flows) // SameSite=Lax: cookie NOT sent for cross-site POST/PUT/DELETE (good default) // SameSite=None: always sent (requires Secure; needed for third-party embeds) res.cookie('session', token, { httpOnly: true, // JS cannot read (XSS protection) secure: true, // HTTPS only sameSite: 'lax', // not sent on cross-site POST → CSRF mitigated maxAge: 900_000 // 15 min in ms }); // ── Layer 2: CSRF token (double-submit cookie pattern) ─────────────────────── // Server generates a random token, stored in a cookie AND a hidden form field. // On POST: verify the form field matches the cookie value. // An attacker's page cannot read your cookies (same-origin policy) → cannot forge. const csrf = require('csrf'); const tokens = new csrf(); // On form render: const secret = await tokens.secret(); const token = tokens.create(secret); req.session.csrfSecret = secret; // Include <input type="hidden" name="_csrf" value="${token}"> in your form // On form submit — verify token: if (!tokens.verify(req.session.csrfSecret, req.body._csrf)) { return res.status(403).send('Invalid CSRF token'); } // ── Layer 3: Check Origin/Referer header ───────────────────────────────────── // Cross-site requests have Origin set to the attacker's domain const origin = req.headers['origin'] || req.headers['referer']; if (origin && !origin.startsWith('https://yourdomain.com')) { return res.status(403).send('Forbidden'); }
Reject input that does not match expected shape/type/range. Returns an error to the caller. Never modifies the data. Examples: "Is this a valid email?", "Is this age between 0 and 120?", "Is this UUID format correct?"
Transform input to remove potentially dangerous content before using it. Examples: strip HTML tags before storing user bio, trim whitespace, normalise unicode, encode special characters for SQL/HTML output.
const { z } = require('zod'); // Define the shape you expect — anything outside this is rejected const CreateUserSchema = z.object({ username: z.string().min(3).max(30).regex(/^[a-zA-Z0-9_]+$/), email: z.string().email(), age: z.number().int().min(13).max(120), role: z.enum(['user', 'moderator']) // explicit allowlist }).strict(); // .strict() rejects any unknown keys (prevents mass assignment) app.post('/users', async (req, res) => { const result = CreateUserSchema.safeParse(req.body); if (!result.success) { return res.status(400).json({ errors: result.error.flatten().fieldErrors }); } // result.data is fully typed and validated — safe to use await createUser(result.data); res.status(201).json({ ok: true }); });
const path = require('path'); // ❌ VULNERABLE — attacker requests: /files?name=../../etc/passwd app.get('/files', (req, res) => { res.sendFile('/uploads/' + req.query.name); // reads ANY file on the system! }); // ✅ FIXED — resolve and verify the path stays within allowed directory app.get('/files', (req, res) => { const base = path.resolve('/uploads'); const filePath = path.resolve(base, req.query.name); if (!filePath.startsWith(base + path.sep)) { return res.status(403).send('Forbidden'); // outside allowed dir } res.sendFile(filePath); });
Attacker tries thousands of password combinations against /login. Without rate limiting: 10,000 attempts per second is trivial. With rate limiting: 5 attempts per 15 minutes — an attacker cannot crack a reasonable password in a lifetime.
Attacker uses a list of leaked usernames/passwords from other breaches and tries them against your login. Also prevents scraping, API abuse, and denial-of-service via expensive endpoints (search, reports).
const rateLimit = require('express-rate-limit'); const RedisStore = require('rate-limit-redis').default; // General API rate limit — 100 requests per 15 minutes per IP const apiLimiter = rateLimit({ windowMs: 15 * 60 * 1000, // 15 minutes max: 100, standardHeaders: true, // sets RateLimit-* headers in response legacyHeaders: false, store: new RedisStore({ sendCommand: (...args) => redisClient.sendCommand(args) }) // Redis store: counters survive server restarts and work across cluster workers }); // Strict limit for auth endpoints — 5 attempts per 15 minutes const authLimiter = rateLimit({ windowMs: 15 * 60 * 1000, max: 5, message: 'Too many login attempts. Try again in 15 minutes.', skipSuccessfulRequests: true // don't count successful logins }); app.use('/api', apiLimiter); app.post('/login', authLimiter, loginHandler); // ⚠ Important: rate limit by user ID AFTER authentication for logged-in endpoints // IP-based limits can be bypassed with multiple IPs / can affect legitimate // shared networks (NAT, VPN, office). For auth endpoints, IP is acceptable.
Server-Side Request Forgery (SSRF) happens when your server makes an HTTP request to a URL that an attacker controls — and the attacker uses this to reach systems that are only accessible from within your internal network. Your server becomes a proxy for the attacker.
Imagine a company receptionist who, when asked, will fetch any document from any internal room and read it to you. An outsider cannot enter the building — but they can ask the receptionist to fetch sensitive documents from the CEO's office. The receptionist is trusted internally, so doors open for them. Your server is that receptionist — it has access to internal services that external attackers cannot reach directly.
http://169.254.169.254/latest/meta-data/iam/security-credentials/ — the AWS EC2 metadata endpoint, only accessible from within the cloud instance.const dns = require('dns/promises'); const net = require('net'); const { URL } = require('url'); // Private/internal IP ranges to block const BLOCKED_PREFIXES = [ '10.', '172.16.', '192.168.', // RFC1918 private '127.', '::1', 'localhost', // loopback '169.254.', 'fd', // link-local + IPv6 private '0.' // 0.0.0.0 ]; async function isSafeUrl(rawUrl) { let parsed; try { parsed = new URL(rawUrl); } catch { return false; } // 1. Only allow https (never file://, ftp://, gopher://) if (parsed.protocol !== 'https:') return false; // 2. Resolve DNS and check the actual IP (prevents DNS rebinding) const addresses = await dns.lookup(parsed.hostname, { all: true }); for (const { address } of addresses) { if (BLOCKED_PREFIXES.some(p => address.startsWith(p))) return false; } // 3. Allowlist approach (even better): only allow specific hostnames const ALLOWED_HOSTS = new Set(['api.partner.com', 'cdn.example.com']); return ALLOWED_HOSTS.has(parsed.hostname); }
.env file (loaded by dotenv). Add .env to .gitignore immediately. Provide a .env.example with dummy values for team onboarding.// ❌ NEVER — hardcoded secret in source const secret = 'sk_live_abc123xyz'; // ✅ Environment variable (development) require('dotenv').config(); // reads .env file (never commit .env!) const secret = process.env.STRIPE_SECRET_KEY; if (!secret) throw new Error('STRIPE_SECRET_KEY is required'); // fail fast // ✅ AWS Secrets Manager (production) const client = new SecretsManagerClient({ region: 'us-east-1' }); const { SecretString } = await client.send( new GetSecretValueCommand({ SecretId: 'prod/myapp/db' }) ); const { password } = JSON.parse(SecretString);
Your node_modules folder contains thousands of packages you did not write. Any of them could have a vulnerability — or be malicious (supply chain attack). This is now one of the most common attack vectors against Node.js applications.
# 1. Audit for known vulnerabilities (run after every npm install) npm audit # 2. Auto-fix safe upgrades npm audit fix # 3. Snyk — deeper analysis including transitive deps and license issues npx snyk test # 4. Lock your dependency versions (always commit package-lock.json) # Use exact versions for critical deps: "express": "4.18.2" not "^4.18.2" # 5. Check for typosquatting before installing # "lodash" is safe. "1odash" (number one) is a malicious package. # Always verify the package name exactly. # 6. Limit what packages can do — use Node's --experimental-permission node --experimental-permission --allow-read=/data --allow-net server.js # Node will throw if the app tries to read/write outside allowed paths
"My security checklist: parameterised queries for all DB access (no string concatenation), Zod/Joi validation at every API boundary, Helmet.js for security headers, rate limiting on auth endpoints, httpOnly+SameSite=Lax cookies for sessions, bcrypt/argon2 for passwords, short-lived JWTs with explicit algorithm, secrets in environment variables (never in code), npm audit in CI, and SSRF protection on any endpoint that fetches user-supplied URLs. Defence in depth — each layer assumes the previous one can fail."
A test proves that your code does what you think it does — right now. More importantly, it proves it still does that six months later after fifty other changes. Tests are a time machine: you write them once and they keep checking correctness forever, automatically, in milliseconds.
A pilot does not skip the pre-flight checklist just because the plane flew fine yesterday. Conditions change, things break, humans make mistakes. The checklist runs every time, automatically catching problems before they become disasters. Your test suite is that checklist — it runs on every commit, every deploy, catching regressions before users see them.
Confidence to refactor — change internals without fear. Living documentation — tests show how code is meant to be used. Faster debugging — a failing test pinpoints exactly what broke. Design pressure — hard-to-test code is usually badly designed.
Tests cannot prove the absence of bugs — only the presence of the behaviours you tested. 100% code coverage does not mean 100% correct. Tests are only as good as the scenarios you imagined. Always pair tests with code review and monitoring.
▲
/ \
/ E2E \ Few, slow, brittle, expensive
/─────────\ Tests the whole system via UI/API
/ \
/ Integration \ Some — test module boundaries
/───────────────\ Real DB, HTTP, file system
/ \
/ Unit Tests \ Many, fast, isolated, cheap
/─────────────────────\ One function, all deps mocked
───────────────────────────
Rule of thumb: 70% unit / 20% integration / 10% E2E
Why the shape? Unit tests are fast (ms), E2E tests are slow (seconds)
Fast tests run on every save; slow ones run in CI
| Unit | Integration | End-to-End (E2E) | |
|---|---|---|---|
| What it tests | One function / class in isolation | Multiple components together (routes, DB, cache) | The whole system via UI or real HTTP |
| Speed | ~1–5 ms per test | ~50–500 ms per test | ~1–30 s per test |
| Dependencies | All mocked | Some real (DB), some mocked | All real (deployed env) |
| Catches | Logic bugs in one unit | Contract bugs between units | User-visible workflow failures |
| Misses | Integration bugs | UI-level issues | Edge cases (covered by unit tests) |
| Tools | Jest, Vitest, Mocha | Jest + supertest + testcontainers | Playwright, Cypress, k6 |
Many teams write almost no unit tests but have hundreds of E2E tests ("ice cream cone" shape). E2E tests are 100× slower, break on unrelated UI changes, are hard to debug, and cannot isolate exactly what failed. A suite of 200 E2E tests that takes 30 minutes to run gives you less confidence than 2000 unit tests that run in 5 seconds. The pyramid shape is intentional.
describe, it, expect, matchers, and beforeEach / afterEach. What are the setup / teardown lifecycle hooks?// ── Structure ───────────────────────────────────────────────────── describe('UserService', () => { // groups related tests let db, service; // ── Lifecycle hooks ──────────────────────────────────────────────── beforeAll(async () => { // runs ONCE before all tests in this describe db = await createTestDb(); }); afterAll(async () => { // runs ONCE after all tests await db.close(); }); beforeEach(async () => { // runs before EACH test — reset state await db.clear(); service = new UserService(db); }); afterEach(() => { // runs after EACH test — cleanup jest.clearAllMocks(); }); // ── Tests ───────────────────────────────────────────────────────── it('creates a user with hashed password', async () => { const user = await service.create({ name: 'Alice', password: 'secret' }); expect(user.id).toBeDefined(); expect(user.name).toBe('Alice'); expect(user.password).not.toBe('secret'); // must be hashed expect(user.password).toMatch(/^\$2[aby]\$/); // bcrypt format }); it('throws when email already exists', async () => { await service.create({ email: 'a@b.com' }); await expect(service.create({ email: 'a@b.com' })) .rejects.toThrow('Email already exists'); }); });
// Equality expect(val).toBe(42) // strict equality (===) — use for primitives expect(obj).toEqual({ a: 1 }) // deep equality — use for objects/arrays expect(obj).toStrictEqual({}) // like toEqual but also checks undefined properties // Truthiness expect(val).toBeTruthy() // not null/undefined/0/false/'' expect(val).toBeFalsy() // null/undefined/0/false/'' expect(val).toBeNull() // exactly null expect(val).toBeUndefined() // exactly undefined // Numbers expect(0.1 + 0.2).toBeCloseTo(0.3) // float comparison (avoids 0.30000000004) expect(5).toBeGreaterThan(3) // Strings / arrays expect('hello').toContain('ell') expect([1,2,3]).toContain(2) expect('hello').toMatch(/^hel/) // Partial object matching expect(user).toMatchObject({ name: 'Alice', role: 'admin' }) // passes even if user has extra properties — great for API responses // Async errors await expect(asyncFn()).rejects.toThrow('message') await expect(asyncFn()).rejects.toMatchObject({ code: 'NOT_FOUND' })
If you forget to return a Promise or await it in a test, Jest considers the test finished as soon as the synchronous code completes — before the async assertions run. The test passes even if the assertions would have failed. This is a false positive: your tests lie to you.
// ── Pattern 1: async/await (recommended) ──────────────────────── it('fetches a user', async () => { const user = await getUser(1); expect(user.name).toBe('Alice'); }); // ── Pattern 2: return a Promise ────────────────────────────────── it('fetches a user', () => { return getUser(1).then(user => { // ← MUST return. Omitting = silent pass. expect(user.name).toBe('Alice'); }); }); // ── Pattern 3: callbacks — use done() ─────────────────────────── it('reads a file', (done) => { fs.readFile('test.txt', (err, data) => { expect(err).toBeNull(); expect(data.toString()).toBe('hello'); done(); // signal Jest the test is complete. Forgetting = timeout. }); }); // ── Common pitfall: not awaiting rejections ────────────────────── // ❌ WRONG — test passes even if the promise never rejects it('throws on bad input', () => { expect(badFn()).rejects.toThrow(); // forgot await/return — always passes! }); // ✅ CORRECT — await the assertion it('throws on bad input', async () => { await expect(badFn()).rejects.toThrow('Invalid input'); }); // ── expect.assertions(n) — fail test if fewer assertions ran ──── it('should reach the catch block', async () => { expect.assertions(1); // test FAILS if this assertion was never reached try { await badFn(); } catch (err) { expect(err.message).toBe('Invalid input'); } });
A unit test should test one thing. If your function calls a database, sends an email, and calls an external API, a test failure could be caused by any of those systems — not your code. Test doubles replace real dependencies with controlled substitutes so your test only verifies your logic.
A stub is a cardboard prop — it looks like a sword but doesn't do anything real. A fake is a rubber sword that can actually be swung. A spy is a hidden camera on the real sword. A mock is a prop with instructions: "record every time someone draws it, and if it's drawn before the fight scene, fail the take."
// ── STUB — returns a fixed value. You don't care HOW it was called. const getUser = jest.fn().mockResolvedValue({ id: 1, name: 'Alice' }); // Call it — it always returns that object. No assertions on how it was called. // ── SPY — wraps a REAL function but records calls const consoleSpy = jest.spyOn(console, 'log'); myFunction(); // console.log still runs, but calls are recorded expect(consoleSpy).toHaveBeenCalledWith('expected message'); consoleSpy.mockRestore(); // restore original after test // ── MOCK — like a spy but with pre-programmed expectations const emailSender = { send: jest.fn() }; sendWelcomeEmail(emailSender, 'alice@example.com'); // Now verify HOW it was called: expect(emailSender.send).toHaveBeenCalledTimes(1); expect(emailSender.send).toHaveBeenCalledWith({ to: 'alice@example.com', subject: expect.stringContaining('Welcome') }); // ── FAKE — a working lightweight implementation (no jest.fn needed) class FakeUserRepository { constructor() { this.users = new Map(); } async save(user) { this.users.set(user.id, user); return user; } async findById(id) { return this.users.get(id) ?? null; } async clear() { this.users.clear(); } } // Tests use this instead of a real DB — fast, isolated, but real logic runs
| Double | Use when |
|---|---|
| Stub | You need to control what a dependency returns but don't care about the call details |
| Spy | You want real behaviour to run but also need to assert how the function was called |
| Mock | You want to replace a dependency AND assert exactly how it was called |
| Fake | The real dependency is too complex/slow but you want real logic (e.g. in-memory DB) |
jest.mock()? What are the rules and gotchas?When you call jest.mock('moduleName'), Jest intercepts all require() calls to that module for the duration of the test file and replaces them with auto-generated mocks. Critically, Jest hoists jest.mock() calls to the top of the file at compile time — before any imports — so the mock is in place before your production code loads the module.
// ── Auto-mock an entire module ─────────────────────────────────── jest.mock('nodemailer'); // all exports become jest.fn() automatically const nodemailer = require('nodemailer'); nodemailer.createTransport.mockReturnValue({ sendMail: jest.fn().mockResolvedValue({ messageId: 'test-id' }) }); // ── Mock with factory function (full control) ──────────────────── jest.mock('../db', () => ({ query: jest.fn().mockResolvedValue([]), close: jest.fn() })); // ── Partial mock — keep some real, mock some ───────────────────── jest.mock('../utils', () => ({ ...jest.requireActual('../utils'), // keep real implementations generateId: jest.fn().mockReturnValue('fixed-id') // replace only this one })); // ── Mock per-test (override for one test) ─────────────────────── beforeEach(() => jest.clearAllMocks()); // reset call counts between tests it('handles DB error', async () => { db.query.mockRejectedValueOnce(new Error('Connection lost')); // Only this test sees the error; next test gets the default mock await expect(getUsers()).rejects.toThrow('Connection lost'); });
// ❌ BROKEN — variable declared before jest.mock() but jest.mock() is hoisted above it const mockFn = jest.fn(); jest.mock('../service', () => ({ doWork: mockFn })); // At runtime, jest.mock factory runs BEFORE const mockFn — mockFn is undefined! // ✅ FIXED — use jest.fn() inside the factory, capture the reference after jest.mock('../service', () => ({ doWork: jest.fn() })); const { doWork } = require('../service'); // capture after jest.mock runs // OR use jest.mocked() for TypeScript-aware mock typing import { doWork } from '../service'; const mockDoWork = jest.mocked(doWork); // typed mock with full autocomplete
Date.now(), and randomness in tests to make time-dependent code deterministic?Tests that depend on real time have three problems: they are slow (a 5-second timeout test takes 5 seconds every run), they are flaky (a test that passes in 100 ms might fail under CI load at 101 ms), and they are non-deterministic (results change based on when the test runs). Jest's fake timers solve all three by giving you full control over the clock.
// ── Fake timers — control setTimeout, setInterval, Date ───────── beforeEach(() => { jest.useFakeTimers(); jest.setSystemTime(new Date('2024-01-15T10:00:00Z')); // fixed clock }); afterEach(() => jest.useRealTimers()); // ALWAYS restore after test it('debounces search after 300ms', () => { const handler = jest.fn(); const debounced = debounce(handler, 300); debounced('a'); debounced('ab'); debounced('abc'); expect(handler).not.toHaveBeenCalled(); // not fired yet jest.advanceTimersByTime(300); // jump clock forward 300ms instantly expect(handler).toHaveBeenCalledTimes(1); expect(handler).toHaveBeenCalledWith('abc'); // only last call fired }); it('generates a timestamp-based ID', () => { // Date.now() returns our fixed time — deterministic! expect(generateId()).toBe('id-1705312800000'); }); // ── Mocking Math.random ────────────────────────────────────────── it('generates predictable tokens', () => { jest.spyOn(Math, 'random').mockReturnValue(0.5); expect(generateToken()).toBe('expected-value-for-0.5'); jest.spyOn(Math, 'random').mockRestore(); }); // ── Running all pending timers ─────────────────────────────────── jest.runAllTimers(); // flush ALL pending timers (careful with recursion) jest.runOnlyPendingTimers(); // safer — only currently queued timers await jest.runAllTimersAsync(); // for async timers (Promises inside timeouts)
supertest? Show a complete setup including authentication and database.supertest wraps your Express app object and starts a temporary HTTP server for the duration of each test. You make real HTTP requests against it without binding to a port, without needing the server to be running separately, and without any network overhead. The full middleware chain runs — routing, validation, auth, body parsing — just like production.
const request = require('supertest'); const app = require('../app'); // import app without .listen() const { db } = require('../db'); const { sign } = require('jsonwebtoken'); describe('POST /api/users', () => { beforeAll(async () => { await db.migrate.latest(); }); afterAll(async () => { await db.destroy(); }); beforeEach(async () => { await db('users').truncate(); }); it('returns 201 and the new user', async () => { const res = await request(app) .post('/api/users') .set('Content-Type', 'application/json') .send({ name: 'Alice', email: 'alice@example.com' }); expect(res.status).toBe(201); expect(res.body).toMatchObject({ name: 'Alice', email: 'alice@example.com' }); expect(res.body.id).toBeDefined(); expect(res.body.password).toBeUndefined(); // must not leak hash }); it('returns 400 for invalid email', async () => { const res = await request(app) .post('/api/users') .send({ name: 'Bob', email: 'not-an-email' }); expect(res.status).toBe(400); expect(res.body.errors.email).toBeDefined(); }); it('returns 401 for unauthenticated admin route', async () => { const res = await request(app).delete('/api/users/1'); expect(res.status).toBe(401); }); it('deletes user when admin token provided', async () => { const token = sign({ role: 'admin' }, process.env.JWT_SECRET); const res = await request(app) .delete('/api/users/1') .set('Authorization', `Bearer ${token}`); expect(res.status).toBe(204); }); });
app object from app.js without calling listen(). Call listen() only in server.js. This lets supertest (and tests) import the app without starting a real server.
nock, msw, and jest-fetch-mock.If your code calls an external API (Stripe, Twilio, GitHub) and your tests actually make those calls, you have: slow tests (network latency), flaky tests (rate limits, downtime), billed API calls, and tests that break when the API changes independently of your code. You need to intercept outgoing HTTP at the network level.
const nock = require('nock'); afterEach(() => nock.cleanAll()); // remove all interceptors after each test it('creates a Stripe charge', async () => { // Intercept the real HTTP call at the socket level — no network nock('https://api.stripe.com') .post('/v1/charges') .reply(200, { id: 'ch_test123', status: 'succeeded' }); const charge = await createCharge({ amount: 1000, currency: 'usd' }); expect(charge.id).toBe('ch_test123'); }); it('handles Stripe errors gracefully', async () => { nock('https://api.stripe.com') .post('/v1/charges') .reply(402, { error: { message: 'Your card was declined.' } }); await expect(createCharge({ amount: 1000 })) .rejects.toThrow('Your card was declined.'); }); // nock also supports: query params, request body matching, delays, retries nock('https://api.example.com') .get('/users') .query({ page: '2' }) // match specific query string .delayConnection(200) // simulate slow network .reply(200, [{ id: 2 }]) .times(3); // match up to 3 times
const { setupServer } = require('msw/node'); const { http, HttpResponse } = require('msw'); const server = setupServer( http.get('https://api.github.com/users/:username', ({ params }) => { return HttpResponse.json({ login: params.username, public_repos: 42 }); }) ); beforeAll(() => server.listen()); afterEach(() => server.resetHandlers()); afterAll(() => server.close()); it('fetches GitHub profile', async () => { const profile = await getGithubProfile('torvalds'); expect(profile.login).toBe('torvalds'); expect(profile.public_repos).toBe(42); });
| Metric | Measures | Example |
|---|---|---|
| Statement | % of individual statements executed | const x = 1; was run at least once |
| Branch | % of if/else/ternary paths taken | Both the if path AND the else path were executed |
| Function | % of functions called at least once | Every function was invoked by some test |
| Line | % of source lines executed | Similar to statement but per source line |
// jest.config.js module.exports = { collectCoverage: true, coverageProvider: 'v8', // use V8's built-in coverage (faster) collectCoverageFrom: [ 'src/**/*.js', '!src/**/*.test.js', // exclude test files themselves '!src/migrations/**' // exclude generated/migration files ], coverageThresholds: { global: { statements: 80, // CI fails if below 80% branches: 75, // branches are harder to cover — set lower functions: 80, lines: 80 }, // Per-file thresholds for critical business logic 'src/payments/': { statements: 95, branches: 95 } }, coverageReporters: ['text', 'html', 'lcov'] // lcov for CI integration (Codecov, SonarQube) };
Code that is hard to test has the same root cause: it reaches out to external systems (databases, files, APIs, clocks) directly, without giving tests a way to substitute those dependencies. The fix is always the same: inject dependencies instead of instantiating them inside the function.
A chef who grows, harvests, and cooks all in one process is impossible to test — you cannot check just the cooking without the farm. A chef who accepts ingredients as inputs can be tested by handing them different ingredients. Dependency injection is handing the chef the ingredients.
// ── 1. Hard-coded dependencies ──────────────────────────────────── // ❌ Hard to test — db is always the real database const db = require('../db'); async function getUser(id) { return db.query('SELECT ...'); } // ✅ Inject the dependency — tests can pass a fake async function getUser(id, db = require('../db')) { ... } // Or better — constructor injection: class UserService { constructor(db) { this.db = db; } // injected from outside async getUser(id) { return this.db.query(...); } } // Test: new UserService(fakeDb) — no real DB needed // ── 2. Global state ─────────────────────────────────────────────── // ❌ Hard to test — global state bleeds between tests let counter = 0; module.exports = { increment: () => ++counter, get: () => counter }; // ✅ Return state-holding closures or classes — each test gets fresh instance function createCounter() { let count = 0; return { increment: () => ++count, get: () => count }; } // ── 3. Side effects in module scope (runs on require) ───────────── // ❌ Hard to test — this runs when you import the module const server = http.createServer().listen(3000); // runs immediately! // ✅ Export a factory — caller decides when to start function createServer() { return http.createServer(...); } module.exports = { createServer }; // import is side-effect free // ── 4. No separation of concerns (God function) ─────────────────── // ❌ One function validates + queries + transforms + logs + emails // To test validation you must deal with all the other concerns // ✅ Split into small single-purpose functions — each testable alone const validate = (input) => ...; // pure — trivial to test const transform = (raw) => ...; // pure — trivial to test const persist = (data, db) => ...; // injectable db const notify = (data, mailer) => ...; // injectable mailer
"Testable code has three properties: pure functions where possible (same input → same output, no side effects), dependency injection instead of hard-coded instantiation (pass db/mailer/clock as parameters so tests can substitute fakes), and separation of concerns (small functions with one job are trivially testable in isolation). The rule of thumb: if you find yourself wrestling with jest.mock() constantly, it means the production code has too many hidden dependencies — refactor to inject them instead."