MongoDB Mastery — Interview Prep

🍃

Segment 1 — Core Concepts & Document Model

The foundation of every MongoDB interview. Understanding the document model, BSON, ObjectId, and why MongoDB exists separates engineers who use MongoDB from those who understand it. These questions appear in virtually every MongoDB interview at any level.

Document Model BSON ObjectId Collections Schema Flexibility WiredTiger

Questions

Opened

Topics Covered

~40m

Study Time

🧠 Mental Model — What MongoDB Really Is

Before memorising APIs, lock in this mental model: MongoDB stores data the way your application thinks about it — as objects, not rows. The core insight is that your domain model and your storage model are the same shape.

📦 The Filing Cabinet Analogy

A SQL database is like a spreadsheet — every row must have the same columns. MongoDB is like a filing cabinet of folders. Each folder (document) can contain whatever papers (fields) it needs. Some folders are thick, some thin. You don't have to pre-define what goes inside each folder — you just put it in.

✅ MongoDB excels at

Rapidly evolving data shapes (product catalogs, CMS)
Hierarchical or nested data (orders with line items)
Horizontal scale-out across many servers
High write throughput applications
Real-time analytics on operational data

❌ MongoDB struggles with

Complex multi-table JOINs (use RDBMS instead)
Strict ACID across many unrelated collections
Highly relational data with many foreign keys
Reporting-heavy workloads (data warehouse is better)

Topic A — The Document Model vs SQL

What is MongoDB? How does it fundamentally differ from a relational database?

Easy Document Model NoSQL

▾

Core Definition

MongoDB is a document-oriented NoSQL database that stores data as flexible JSON-like documents (technically BSON). Unlike relational databases which organise data into rigid tables with rows and columns, MongoDB stores each record as a self-contained document that can have its own structure.

Terminology Mapping — SQL vs MongoDB

SQL Concept	MongoDB Equivalent	Key Difference
Database	Database	Same concept
Table	Collection	Collections have no enforced schema
Row	Document	Documents can differ in structure
Column	Field	Fields can be nested objects or arrays
Primary Key	_id field	Auto-generated ObjectId by default
JOIN	$lookup (aggregation)	Prefer embedding over joining
Foreign Key	Manual reference (DBRef)	Not enforced at DB level
Index	Index	Same concept, more types available

The Core Architectural Difference

Relational databases normalise data — splitting it across many tables to avoid duplication, then using JOINs to reconstruct it. MongoDB denormalises by design — related data lives together in one document, avoiding the JOIN cost at read time.

SQL vs MongoDB — same data

-- SQL: data split across 3 tables, need JOIN to reconstruct
SELECT o.id, u.name, p.title, oi.quantity
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id;

// MongoDB: one read, all data together
db.orders.findOne({ _id: orderId });
// Returns: { _id, user: { name, email }, items: [{ title, qty, price }] }

Why This Matters in Interviews

Interviewers want to know you understand the trade-off: embedding avoids JOINs but can lead to data duplication. The right answer shows you know MongoDB is a deliberate design choice, not just "a faster MySQL."

How to answer in an interview

"MongoDB is a document-oriented database that stores data as BSON documents — essentially flexible JSON objects. The fundamental difference from relational databases is that MongoDB avoids the table-column-row model in favour of documents that can contain nested data and arrays. This means you design schemas around your access patterns — the data your app reads together lives together — trading JOIN-heavy reads for potential data duplication."

What is BSON? How does it differ from JSON, and why does MongoDB use it?

Easy BSON

▾

What is BSON?

BSON (Binary JSON) is a binary-encoded serialisation of JSON-like documents. MongoDB stores documents in BSON format internally and transmits them over the wire in BSON. Your application driver converts between BSON and native language objects transparently.

JSON vs BSON — Key Differences

Property	JSON	BSON
Format	Text (UTF-8 string)	Binary encoded
Human-readable	Yes	No
Parse speed	Slower (text parsing)	Faster (binary, length-prefixed)
Data types	String, Number, Boolean, Array, Object, null	All JSON types + Date, Binary, ObjectId, Int32, Int64, Decimal128, Regex, Timestamp
Size	Smaller for simple data	Can be larger due to type metadata
Traversal	Must parse entire string	Skip fields using length prefix — fast

Types BSON Adds That JSON Lacks

MongoDB Shell / BSON types

{
  _id:        ObjectId("507f1f77bcf86cd799439011"), // 12-byte unique ID
  createdAt:  new Date("2026-01-15"),          // ISODate — NOT a string
  price:      Decimal128("19.99"),             // Precise decimal (no float errors)
  views:      NumberLong(9999999999),           // 64-bit integer
  data:       BinData(0, "base64string"),       // Binary blob (images, files)
  pattern:   RegExp(/^mongo/i),               // Stored regex
  isActive:  true                               // Boolean (same as JSON)
}

⚠ Common Gotcha — Dates

Always store dates as BSON Date objects, never as strings. BSON Date enables proper date comparison queries ($gt, $lt) and range queries. Storing "2026-01-15" as a string means alphabetical comparison — completely wrong for dates.

How to answer in an interview

"BSON is MongoDB's binary serialisation format — it's JSON's superset. The two key advantages over plain JSON are: firstly, it supports richer types like Date, Int64, Decimal128, and Binary that JSON doesn't have — this prevents precision issues with numbers and enables proper date queries. Secondly, BSON is length-prefixed so the driver can skip fields without parsing the entire document, making it faster to read."

What is a MongoDB document? What are its constraints and supported field types?

Easy Documents

▾

What is a Document?

A document is a set of key-value pairs stored in BSON format. It is the basic unit of data in MongoDB — analogous to a row in SQL but far more flexible. Documents can contain nested documents (subdocuments), arrays, and arrays of subdocuments.

Document Constraints

Maximum size: 16 MB per document (hard limit)
_id field: Every document must have a unique _id — auto-generated as ObjectId if not provided
Field names: Cannot start with $ or contain . (dot) — these are reserved for query operators
Nesting depth: No hard limit, but deeply nested documents are hard to query and index
Field order: Field order is preserved in documents (unlike most JSON parsers)

Real-World Document Example

MongoDB Document

{
  _id: ObjectId("507f1f77bcf86cd799439011"),   // auto-generated unique key
  name: "MongoDB Mastery",                       // String
  price: Decimal128("49.99"),                    // Precise decimal
  inStock: true,                                 // Boolean
  tags: ["mongodb", "nosql", "database"],        // Array
  publishedAt: new Date("2026-01-01"),           // BSON Date
  author: {                                       // Embedded subdocument
    name: "Yogesh Tiwari",
    email: "yogesh@example.com"
  },
  reviews: [                                      // Array of subdocuments
    { user: "alice", rating: 5, comment: "Great!" },
    { user: "bob",   rating: 4, comment: "Good"   }
  ],
  metadata: null                                 // null value
}

16MB Limit — What to Do

If a document might exceed 16MB (e.g., storing file content), use GridFS — MongoDB's specification for storing large files by splitting them into 255KB chunks stored as separate documents.

How to answer in an interview

"A MongoDB document is a BSON key-value structure — think of it as a rich JSON object. Key constraints: maximum 16MB, every document needs a unique _id, and field names can't use dots or start with dollar signs. What makes documents powerful is that field values can be strings, numbers, booleans, dates, arrays, or even other embedded documents — you can represent an entire entity with all its relationships in one place."

What is a Collection in MongoDB? How does it compare to a SQL table?

Easy Collections

▾

What is a Collection?

A collection is a grouping of MongoDB documents — the equivalent of a SQL table, but without a fixed schema. Collections are created automatically when you first insert a document into them. No pre-definition (CREATE TABLE) required.

Collection vs SQL Table

Feature	SQL Table	MongoDB Collection
Schema	Fixed — every row has the same columns	Dynamic — each document can differ
Data types per field	Enforced by column definition	Any BSON type per field per document
Adding new fields	ALTER TABLE (can lock table)	Just insert a document with the new field
Creation	Explicit CREATE TABLE	Auto-created on first insert
Joins	Native JOIN across tables	$lookup in aggregation (not native)
Validation	Column constraints, triggers	Optional JSON Schema validation

Capped Collections

A special collection type with a fixed size — when full, it overwrites the oldest documents. Useful for logs, event streams, and audit trails where you only care about recent data.

MongoDB Shell

// Create a capped collection — max 1000 docs or 5MB
db.createCollection("auditLog", {
  capped: true,
  size: 5242880,   // 5MB in bytes
  max:  1000        // optional: max document count
});

How to answer in an interview

"A collection is like a SQL table but schema-free — it groups related documents without enforcing a structure. The key differences: collections are created implicitly on first insert, documents within the same collection can have completely different fields, and adding new fields to documents doesn't require any schema migration. You can optionally add JSON Schema validation to a collection if you need structure enforcement."

Topic B — ObjectId & Identity

What is an ObjectId? How is it structured, and why is it better than SQL auto-increment?

Easy ObjectId

▾

ObjectId Structure — 12 Bytes Total

Bytes 0–3: Unix timestamp — seconds since epoch. This makes ObjectIds sortable by creation time without a separate createdAt field.

Bytes 4–8: Random value — generated once per process startup. Unique per machine and process.

Bytes 9–11: Incrementing counter — initialised to a random value, then incremented. Handles multiple inserts within the same second on the same process.

Working with ObjectIds

JavaScript (Node.js Driver)

const { ObjectId } = require('mongodb');

const id = new ObjectId();

console.log(id.toString());      // "507f1f77bcf86cd799439011"
console.log(id.getTimestamp()); // 2026-01-01T00:00:00.000Z — embedded creation time!

// IMPORTANT: always query with ObjectId, not a string
db.collection('users').findOne({ _id: new ObjectId("507f1f77bcf86cd799439011") });
// NOT: { _id: "507f1f77bcf86cd799439011" } ← this won't match!

ObjectId vs SQL Auto-increment

Feature	SQL AUTO_INCREMENT	MongoDB ObjectId
Distributed generation	❌ Requires central DB coordination	✅ Generated client-side — no round-trip
Predictability	❌ Sequential — easy to enumerate	✅ Randomly seeded — harder to guess
Embedded metadata	❌ Just a number	✅ Contains creation timestamp
Sharding-safe	❌ Conflicts across shards	✅ Globally unique across shards
Sort by creation	Yes (by value)	Yes (sort by _id gives time order)

⚠ Common Mistake — String vs ObjectId

Always convert string IDs to ObjectId before querying. { _id: "507f1f..." } will never match a document whose _id is an ObjectId. This is the #1 ObjectId bug beginners hit.

How to answer in an interview

"An ObjectId is a 12-byte unique identifier — 4 bytes for Unix timestamp, 5 random bytes unique per process, and 3 bytes for an incrementing counter. This design means ObjectIds are globally unique without coordination between servers, which is critical for sharding. A bonus: sorting by _id gives you documents in insertion order, and you can extract the creation timestamp from any ObjectId without a separate field."

Topic C — Schema Flexibility

What does "schema-less" mean in MongoDB? Is it truly schema-less?

Medium Schema Flexibility

▾

The "Schema-less" Misconception

MongoDB is often called "schema-less" but that's misleading. A more accurate term is schema-flexible or dynamic schema. MongoDB doesn't enforce a schema at the database level by default — but your application absolutely has a schema. It's just implicit in your code rather than enforced by the database engine.

📝 The Analogy

SQL is like a government form — every field is printed and you must fill them all in. MongoDB is like a blank notebook — you can write whatever you want, but your team still has conventions about what goes on each page. "Schema-less" means the notebook doesn't yell at you if you skip a section — it doesn't mean you don't have rules.

Schema Flexibility in Practice

MongoDB Shell — valid but messy

// All 3 can coexist in the same collection — MongoDB won't complain
db.users.insertMany([
  { name: "Alice", email: "alice@example.com", age: 30 },
  { name: "Bob",   phone: "+91-9999"                      },  // no email or age
  { fullName: "Charlie D", age: "twenty-five"           }   // age as string!
]);

Adding Schema Validation with $jsonSchema

MongoDB supports optional JSON Schema validation — you can enforce that all documents in a collection meet certain rules, while still allowing fields to be added freely.

MongoDB Shell — Schema Validation

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "email"],
      properties: {
        name:  { bsonType: "string",  description: "must be a string" },
        email: { bsonType: "string",  pattern: "^.+@.+$"           },
        age:   { bsonType: "int",     minimum: 0, maximum: 150        }
      }
    }
  },
  validationAction: "error"  // "warn" just logs, "error" rejects invalid inserts
});

Best Practice

In production, always use JSON Schema validation or an ODM like Mongoose for schema enforcement. Pure "no schema" is fine for prototyping but leads to data quality issues at scale.

How to answer in an interview

"MongoDB is schema-flexible, not truly schema-less. The database doesn't enforce a schema by default — different documents in the same collection can have different fields. But in practice, your application has an implicit schema baked into your code. MongoDB lets you add JSON Schema validation rules to enforce required fields and data types when you need that discipline. The flexibility shines during rapid development and schema evolution — adding a new field is just inserting it, no ALTER TABLE required."

What are embedded documents vs references? When do you use each?

Medium Data Modeling

▾

The Two Approaches

Embedding (Denormalised)

Nest related data inside the parent document as a subdocument or array. Everything is in one place — one read gets everything.

Referencing (Normalised)

Store a reference ID to another collection. Related data lives in a separate document — requires a second read or $lookup.

Code Example — Both Approaches

MongoDB — Embedded vs Referenced

// ── EMBEDDING: address inside user ──
{
  _id: ObjectId("..."),
  name: "Alice",
  address: {                    // embedded — always fetched with user
    street: "123 Main St",
    city: "Chandigarh",
    zip: "160001"
  }
}

// ── REFERENCING: author as a separate document ──
// posts collection
{
  _id: ObjectId("post1"),
  title: "MongoDB Guide",
  authorId: ObjectId("user1")  // reference — lookup separately
}
// users collection
{ _id: ObjectId("user1"), name: "Alice", email: "alice@example.com" }

Decision Table — Embed or Reference?

Use Embedding When…	Use References When…
Data is always read together with the parent	Data is read independently of the parent
The child data belongs to one parent only (1:1)	The child is shared by many parents (N:N)
The child data doesn't change frequently	The child is updated often and independently
The array won't grow unboundedly	The array could grow very large (thousands+)
Atomic updates to parent + child are needed	You need to query/filter the child independently

⚠ The Unbounded Array Problem

Never embed a relationship that could grow without limit (e.g., all comments on a viral post). An unbounded embedded array will eventually push the document past the 16MB limit and degrade performance as the array grows. Use references for high-cardinality relationships.

How to answer in an interview

"The rule is: embed when you read together, reference when you read independently. Address in a user document is a classic embed — you always want it with the user. But a user's blog posts should be referenced — you might want posts without the user, and there could be thousands of them. The key constraints are: never embed arrays that could grow without bounds, and only embed if atomic updates to both parent and child are needed."

Topic D — MongoDB in Context

What are the main advantages and disadvantages of MongoDB?

Medium Trade-offs

▾

Advantages

Flexible schema: Rapid iteration without ALTER TABLE migrations. New fields just appear in new documents.
Developer-friendly data model: Documents map directly to objects in code — no ORM impedance mismatch.
Horizontal scaling (sharding): Distribute data across many servers natively — harder to do with SQL.
Rich query language: Supports ad hoc queries, aggregation pipeline, full-text search, geospatial queries, and time-series.
High write throughput: WiredTiger's document-level locking allows far more concurrent writes than table-level locking.
Native JSON ecosystem: Perfect for Node.js, REST APIs — data flows end-to-end without transformation.
Built-in replication: Replica sets provide automatic failover and read scaling out of the box.

Disadvantages

No native JOINs: $lookup exists but is slower than SQL JOINs — not ideal for relational data.
16MB document limit: Can't store large files directly (requires GridFS workaround).
Data duplication: Embedding denormalises data — updates to shared data require updating multiple documents.
Memory-hungry: WiredTiger cache defaults to 50% of RAM — needs generous memory allocation.
Multi-document transactions add overhead: Available since 4.0, but they're slower than single-document operations.
No declarative referential integrity: No foreign key constraints — your application must maintain consistency.

How to answer in an interview

"MongoDB's main advantages are flexible schema evolution, natural JSON data model for JavaScript apps, built-in horizontal scaling via sharding, and high write throughput from document-level locking. The downsides: it doesn't enforce referential integrity, multi-document JOINs are expensive, and denormalization means update anomalies require updating multiple documents. The right choice depends on whether your data is document-shaped and your access patterns favour reads-by-document over complex joins."

When should you choose MongoDB over a relational database (and vice versa)?

Medium Architecture

▾

Use MongoDB When…

Schema evolves rapidly: SaaS products early-stage, product catalogs with variable attributes
Data is hierarchical: Orders with line items, blog posts with comments, user profiles with nested settings
High write throughput: IoT telemetry, event logging, activity feeds, clickstream data
Horizontal scale is a requirement: Multi-region, petabyte-scale datasets
JavaScript / Node.js stack: JSON end-to-end — no serialisation overhead
Geospatial queries: Location-based apps, maps, proximity searches

Use a Relational Database When…

Data is highly relational: Financial systems, ERP, complex many-to-many relationships
Strict ACID across many entities: Banking, payments, inventory management
Complex reporting and analytics: Ad hoc SQL queries, BI tools, data warehouses
Referential integrity is non-negotiable: Foreign keys enforced at DB level
Legacy system integration: Most enterprise software speaks SQL

Real-World Hybrid Approach

Most large-scale applications use both. MongoDB for user profiles, content, and logs; PostgreSQL for financial transactions and billing; Redis for caching. Don't treat it as MongoDB vs SQL — treat it as the right tool for each workload.

How to answer in an interview

"I choose MongoDB when the data is document-shaped, the schema evolves frequently, or the access pattern is 'give me this whole entity.' I choose PostgreSQL or MySQL when data is highly relational with many foreign keys, when I need strict ACID guarantees across multiple entities, or when complex reporting queries are the primary workload. In practice most production systems use both — MongoDB for operational workloads and a relational DB for transactional data."

Topic E — Storage Engine

What is the WiredTiger storage engine? How does MongoDB store and manage data physically?

Hard Internals WiredTiger

▾

WiredTiger — The Default Since MongoDB 3.2

WiredTiger is MongoDB's default (and recommended) storage engine. It replaced the older MMAPv1 engine and brought massive improvements in concurrency, compression, and performance.

Key WiredTiger Features

Feature	Detail	Why It Matters
Document-level concurrency	MVCC (Multi-Version Concurrency Control) — each write gets its own version	Multiple writers can work simultaneously without blocking each other
Compression	Snappy (default), zlib, or zstd for data; prefix for indexes	Reduces disk usage by 50–80% — also reduces I/O
Cache	Internal cache: 50% of (RAM − 1GB) by default, min 256MB	Keeps hot data in memory for fast reads
Write-ahead log (journal)	All writes go to the journal before being applied	Crash recovery — no data loss on sudden shutdown
Checkpoints	Every 60 seconds, WiredTiger flushes a consistent snapshot to disk	Limits how far back journal replay needs to go after a crash

How Data Flows — From Insert to Disk

Write accepted — driver sends BSON document over the network.

Journal write — operation is written to the WiredTiger journal (WAL). This is what guarantees durability on crash.

Cache update — document enters WiredTiger's in-memory cache as a "dirty page."

Checkpoint — every 60s (or when cache pressure is high), dirty pages are flushed to the data files on disk.

Index updates — all relevant B-tree indexes are updated in the cache and eventually flushed to disk.

MongoDB Shell — Storage Engine Info

// Check the storage engine in use
db.serverStatus().storageEngine
// { name: "wiredTiger", ... }

// Check WiredTiger cache statistics
db.serverStatus().wiredTiger.cache

// Check compression (collection stats)
db.runCommand({ collStats: "users" })
// Shows: storageSize (compressed), totalIndexSize, wiredTiger.creationString

MVCC — Why Document-Level Locking Is Fast

MVCC means readers never block writers and writers never block readers. When a write happens, WiredTiger creates a new version of the document. Readers see the old version until the write commits. This is the same pattern PostgreSQL uses — far superior to MMAPv1's collection-level locking.

How to answer in an interview

"WiredTiger has been MongoDB's default storage engine since 3.2. Its key advantages are document-level concurrency via MVCC — readers and writers don't block each other — and compression, which cuts disk usage significantly. Data flows from insert → journal (for durability) → in-memory cache (dirty page) → disk (at checkpoint every 60s). The internal cache defaults to 50% of RAM, so MongoDB is memory-hungry by design. If you're memory-constrained, tune wiredTigerCacheSizeGB in your config."

✏️

Segment 2 — CRUD Operations & Queries

The day-to-day language of MongoDB. Mastering CRUD — insert, find, update, delete — and the rich query operator syntax is the baseline every interviewer tests. Understanding cursors, projections, update operators, and upsert separates junior from mid-level MongoDB engineers.

insertOne / insertMany find / Cursors Update Operators Query Operators Projections Upsert

Questions

Opened

Topics Covered

~45m

Study Time

🧠 Mental Model — CRUD as a Conversation

Think of CRUD as a four-phrase vocabulary: Create (insert), Read (find), Update (update/replace), Delete (delete). Every operation takes a filter (which documents to target) and optionally a payload (what to do). Get that pattern locked in and every method becomes predictable.

🗂️ The Librarian Analogy

MongoDB is like a librarian. Insert = add a new book to the shelf. Find = "show me all books by author X published after 2020." Update = go to specific books and change their status to "checked out." Delete = remove books from the shelf. The filter is your search criteria — the more precise it is, the less work the librarian does.

Key Rule — Filters First

Every write operation (update, delete) takes a filter document as its first argument. An empty filter {} matches all documents. Always double-check your filter before running updateMany or deleteMany in production — a wrong filter is not easily undone.

Topic A — Basic CRUD

What are the core CRUD operations in MongoDB? Explain each with syntax examples.

Easy CRUD

▾

Create — insertOne & insertMany

MongoDB Shell

// Insert a single document
db.users.insertOne({
  name: "Alice",
  email: "alice@example.com",
  age: 30,
  createdAt: new Date()
});
// Returns: { acknowledged: true, insertedId: ObjectId("...") }

// Insert multiple documents in one round-trip
db.users.insertMany([
  { name: "Bob",     email: "bob@example.com",     age: 25 },
  { name: "Charlie", email: "charlie@example.com", age: 35 }
], { ordered: false }); // ordered:false = continue on partial error

Read — find & findOne

MongoDB Shell

// Find one document matching filter
db.users.findOne({ email: "alice@example.com" });

// Find all documents matching filter
db.users.find({ age: { $gte: 25 } })
         .sort({ name: 1 })   // 1 = ascending, -1 = descending
         .limit(10)
         .skip(0);

// Count matching documents
db.users.countDocuments({ age: { $gte: 25 } });

Update — updateOne & updateMany

MongoDB Shell

// Update one matching document
db.users.updateOne(
  { email: "alice@example.com" },      // filter
  { $set: { age: 31, updatedAt: new Date() } } // update
);

// Update all matching documents
db.users.updateMany(
  { age: { $lt: 18 } },
  { $set: { isMinor: true } }
);

Delete — deleteOne & deleteMany

MongoDB Shell

// Delete one document
db.users.deleteOne({ email: "bob@example.com" });

// Delete all matching documents
db.users.deleteMany({ isActive: false });

// ⚠ deleteMany({}) deletes EVERY document in the collection!

How to answer in an interview

"MongoDB's CRUD maps to four method families: insertOne/insertMany for create, find/findOne for read, updateOne/updateMany/replaceOne for update, and deleteOne/deleteMany for delete. Every write operation takes a filter as its first argument — which documents to target — and all of them return an acknowledgement result. The key gotcha is always reviewing your filter before deleteMany or updateMany in production."

How does the find() cursor work? What methods can you chain on it?

Easy Cursors

▾

What is a Cursor?

find() does not immediately return all documents — it returns a cursor: a pointer to the result set on the server. Documents are fetched in batches (default 101 documents or 16MB, whichever comes first) as you iterate. This is critical for memory efficiency with large result sets.

📖 The Cursor Analogy

A cursor is like a bookmark in a very long book. MongoDB holds the page open on the server. You ask for pages one at a time — you don't print the whole book at once. If you don't read the cursor within 10 minutes (default cursor timeout), MongoDB closes it automatically.

Cursor Methods

Method	What it does	Example
.sort()	Order results by field(s). 1=asc, -1=desc	`.sort({ age: -1 })`
.limit()	Cap the number of documents returned	`.limit(20)`
.skip()	Skip N documents (for offset pagination)	`.skip(40)`
.projection()	Include/exclude fields in the result	`.project({ name:1, _id:0 })`
.count()	Return count of matching documents	`.count()`
.toArray()	Exhaust cursor into an in-memory array	`await cursor.toArray()`
.forEach()	Iterate documents one at a time	`await cursor.forEach(fn)`
.explain()	Show query execution plan	`.explain("executionStats")`

Cursor in Node.js — Two Iteration Patterns

JavaScript (Node.js Driver)

const cursor = db.collection('users').find({ age: { $gte: 18 } });

// Pattern 1: toArray() — loads all into memory (fine for small sets)
const users = await cursor.toArray();

// Pattern 2: async iterator — streams one at a time (large result sets)
for await (const user of cursor) {
  processUser(user); // handle each document without loading all into memory
}

// Always close the cursor when done if not exhausted
await cursor.close();

⚠ skip() Performance Warning

.skip(N) scans and discards N documents before returning results. For page 100 with 20 items per page, MongoDB scans 2,000 documents and throws away 1,980. For deep pagination, use cursor-based pagination — filter by the last seen _id instead.

How to answer in an interview

"find() returns a cursor — a lazy pointer to the result set — not the documents themselves. Documents are batched from the server as you iterate. You chain methods on the cursor before iterating: sort, limit, skip, project. In Node.js, use toArray() for small sets or an async for-await loop for large sets to avoid loading everything into memory. One important gotcha: skip() degrades at scale — prefer cursor-based pagination using the last seen _id."

What is the difference between updateOne, updateMany, and replaceOne?

Medium Update

▾

Side-by-Side Comparison

Method	Scope	Payload	Key Behaviour
updateOne	First matching document only	Update operators (`$set`, `$inc`…)	Modifies specific fields — untouched fields remain
updateMany	All matching documents	Update operators	Same as updateOne but applies to every match — not atomic across documents
replaceOne	First matching document only	A full replacement document (no operators)	Overwrites the entire document except `_id`

Code — The Critical Difference

MongoDB Shell

// Original document:
// { _id: 1, name: "Alice", email: "alice@x.com", age: 30, role: "admin" }

// updateOne — ONLY changes age, everything else untouched
db.users.updateOne(
  { _id: 1 },
  { $set: { age: 31 } }
);
// Result: { _id: 1, name: "Alice", email: "alice@x.com", age: 31, role: "admin" }

// replaceOne — ENTIRE document is replaced (only _id survives)
db.users.replaceOne(
  { _id: 1 },
  { name: "Alice", age: 31 }  // no $set — this IS the new document
);
// Result: { _id: 1, name: "Alice", age: 31 }  ← email and role are GONE!

⚠ replaceOne Without Operators

If you accidentally pass a replacement document to updateOne without an operator like $set, MongoDB will throw an error in newer versions — but in older versions it could silently replace the document. Always use update operators with updateOne/updateMany.

How to answer in an interview

"The key distinction is payload type. updateOne and updateMany take update operator expressions like $set — they surgically modify specific fields while leaving the rest untouched. replaceOne takes a plain document — it overwrites the entire document, keeping only the _id. Use replaceOne when you want to reset a document to a completely new state; use updateOne when you want to change specific fields only."

Explain the core update operators: $set, $unset, $inc, $push, $pull, $addToSet.

Medium Update Operators

▾

Field Update Operators

MongoDB Shell

// $set — add or overwrite specific fields
db.products.updateOne({ _id: 1 }, {
  $set: { price: 29.99, updatedAt: new Date() }
});

// $unset — remove a field entirely (value is irrelevant, use "")
db.products.updateOne({ _id: 1 }, {
  $unset: { legacyField: "" }
});

// $inc — atomically increment (or decrement with negative value)
db.products.updateOne({ _id: 1 }, {
  $inc: { stock: -1, soldCount: 1 }  // decrement stock, increment sales
});

// $rename — rename a field
db.users.updateMany({}, {
  $rename: { fullname: "name" }
});

Array Update Operators

MongoDB Shell

// $push — append an element to an array (allows duplicates)
db.posts.updateOne({ _id: 1 }, {
  $push: { tags: "mongodb" }
});

// $push with $each — append multiple elements at once
db.posts.updateOne({ _id: 1 }, {
  $push: { tags: { $each: ["nosql", "database"] } }
});

// $addToSet — append only if not already present (no duplicates)
db.posts.updateOne({ _id: 1 }, {
  $addToSet: { tags: "mongodb" }  // ignored if "mongodb" already exists
});

// $pull — remove all elements matching a condition
db.posts.updateOne({ _id: 1 }, {
  $pull: { tags: "nosql" }
});

// $pop — remove first (-1) or last (1) element of an array
db.posts.updateOne({ _id: 1 }, {
  $pop: { tags: -1 }  // remove first element
});

Quick Reference

$setSet/overwrite field value

$unsetRemove a field completely

$incAtomic increment/decrement

$renameRename a field

$pushAppend to array (dupes OK)

$addToSetAppend only if unique

$pullRemove matching elements

$popRemove first or last element

How to answer in an interview

"Update operators are the verbs inside MongoDB write operations. $set and $unset add or remove fields. $inc is the go-to for counters — it's atomic, so two concurrent increments won't collide. For arrays: $push appends (allows duplicates), $addToSet appends only if unique — ideal for tag lists. $pull removes matching elements. Multiple operators can be combined in one update call, and the whole thing is atomic at the single-document level."

Topic B — Query Operators

How do comparison query operators work? Explain $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin.

Easy Query Operators

▾

Comparison Operators — Reference Table

Operator	Meaning	SQL Equivalent
$eq	Equal to (default when no operator given)	`= value`
$ne	Not equal to	`<> value`
$gt	Greater than	`> value`
$gte	Greater than or equal to	`>= value`
$lt	Less than	`< value`
$lte	Less than or equal to	`<= value`
$in	Matches any value in an array	`IN (a, b, c)`
$nin	Does not match any value in an array	`NOT IN (a, b, c)`

Practical Examples

MongoDB Shell

// $eq — explicit (same as shorthand)
db.users.find({ age: { $eq: 30 } });
db.users.find({ age: 30 }); // shorthand — same result

// $gt / $lte — range query (users aged 18–65)
db.users.find({ age: { $gte: 18, $lte: 65 } });

// $in — match multiple values (like SQL IN)
db.users.find({ status: { $in: ["active", "pending"] } });

// $nin — exclude values
db.users.find({ role: { $nin: ["admin", "superuser"] } });

// $ne — not equal
db.orders.find({ status: { $ne: "cancelled" } });

// Combined — products priced between $10 and $50, not out of stock
db.products.find({
  price:  { $gte: 10, $lte: 50 },
  status: { $ne: "out_of_stock" }
});

$in on _id for Batch Lookups

$in is extremely useful for fetching multiple documents by ID in one query — much faster than looping and calling findOne per ID.
db.users.find({ _id: { $in: [id1, id2, id3] } })

How to answer in an interview

"Comparison operators are used inside filter documents to express conditions. { age: 30 } is shorthand for { age: { $eq: 30 } }. Range queries combine operators: { age: { $gte: 18, $lte: 65 } }. $in is the MongoDB equivalent of SQL's IN and is very efficient when targeting a set of known values — especially useful for batch ID lookups."

What are logical query operators? Explain $and, $or, $not, and $nor.

Medium Query Operators

▾

Logical Operators Overview

Operator	Meaning	SQL Equivalent
$and	All conditions must be true	`WHERE A AND B`
$or	At least one condition must be true	`WHERE A OR B`
$not	Inverts the condition (wraps a single operator expression)	`WHERE NOT A`
$nor	None of the conditions must be true	`WHERE NOT (A OR B)`

Code Examples

MongoDB Shell

// $and — implicit (comma-separated conditions default to AND)
db.users.find({ age: { $gte: 18 }, isActive: true }); // implicit $and

// $and — explicit (required when same field appears multiple times)
db.products.find({
  $and: [
    { price: { $gte: 10 } },
    { price: { $lte: 50 } }
  ]
});
// Shorthand: { price: { $gte: 10, $lte: 50 } } (same result)

// $or — match users who are admins OR have premium subscription
db.users.find({
  $or: [
    { role: "admin" },
    { subscription: "premium" }
  ]
});

// $not — users whose age is NOT less than 18
db.users.find({ age: { $not: { $lt: 18 } } });
// Note: $not wraps an operator expression, not a plain value

// $nor — exclude both banned and suspended users
db.users.find({
  $nor: [
    { status: "banned" },
    { status: "suspended" }
  ]
});

⚠ $and vs Implicit AND

In most cases, a comma-separated filter is an implicit $and and you don't need the explicit operator. You need explicit $and only when you have two conditions on the same field that can't be merged — for example when using two different operators on the same field that conflict.

How to answer in an interview

"Logical operators combine multiple conditions. Comma-separated conditions in a filter are already an implicit $and. You use explicit $and only when you need two separate conditions on the same field. $or is the most common — use it when any one of several conditions should match. $not wraps an operator expression (not a value), and $nor is essentially NOT (A OR B) — neither condition can match."

Topic C — Arrays & Embedded Documents

How do you query arrays and embedded (nested) documents in MongoDB?

Medium Arrays Nested Docs

▾

Querying Arrays

MongoDB Shell

// Sample document: { tags: ["mongodb", "nosql", "database"], scores: [85, 92, 78] }

// Match documents where array CONTAINS a value
db.posts.find({ tags: "mongodb" }); // finds docs where tags array includes "mongodb"

// Match documents where array equals EXACTLY this array (same order)
db.posts.find({ tags: ["mongodb", "nosql"] }); // exact match, order matters

// $all — array must contain ALL of these values (any order)
db.posts.find({ tags: { $all: ["mongodb", "nosql"] } });

// $size — array has exactly N elements
db.posts.find({ tags: { $size: 3 } });

// $elemMatch — at least one element matches ALL conditions
db.students.find({
  scores: { $elemMatch: { $gte: 90, $lt: 100 } }
});

Querying Embedded (Nested) Documents — Dot Notation

MongoDB Shell

// Sample document: { address: { city: "Chandigarh", zip: "160001" } }

// Dot notation to query nested fields
db.users.find({ "address.city": "Chandigarh" });
db.users.find({ "address.zip": { $in: ["160001", "160017"] } });

// Array of subdocuments — $elemMatch for subdoc conditions
// Sample: { orders: [{ product: "Book", qty: 2 }, { product: "Pen", qty: 5 }] }
db.users.find({
  orders: {
    $elemMatch: { product: "Book", qty: { $gte: 2 } }
  }
});

// Update a specific array element by index using dot notation
db.users.updateOne(
  { _id: 1 },
  { $set: { "address.city": "Delhi" } }
);

$elemMatch vs Dot Notation on Arrays

Without $elemMatch, conditions on array subdocuments are evaluated independently — any element can satisfy one condition while a different element satisfies another. $elemMatch ensures a single element satisfies all conditions simultaneously. Always use $elemMatch when querying arrays of objects with multiple conditions.

How to answer in an interview

"To query arrays: a plain equality filter like { tags: 'mongodb' } checks if the array contains that value. $all requires all listed values to be present. $elemMatch is key for arrays of subdocuments — it ensures all conditions match within a single array element. For nested documents, use dot notation: 'address.city': 'Chandigarh'. Dot notation also works in update operators: $set: { 'address.zip': '160001' }."

Topic D — Projections, Upsert & Return Modifiers

What is a projection in MongoDB? How do you include and exclude fields?

Easy Projections

▾

What is a Projection?

A projection is the second argument to find() that controls which fields are returned. Instead of fetching entire documents, you specify exactly which fields your application needs — reducing network payload and memory usage.

Inclusion vs Exclusion Mode

MongoDB Shell

// INCLUSION mode (1) — only return specified fields + _id
db.users.find({}, { name: 1, email: 1 });
// Returns: { _id: ..., name: "Alice", email: "alice@x.com" }

// Suppress _id explicitly (the only field you can mix include + exclude)
db.users.find({}, { name: 1, email: 1, _id: 0 });
// Returns: { name: "Alice", email: "alice@x.com" }

// EXCLUSION mode (0) — return everything EXCEPT specified fields
db.users.find({}, { password: 0, __v: 0 });
// Returns all fields except password and __v

// Nested field projection with dot notation
db.users.find({}, { "address.city": 1, name: 1 });

// Array element projection — return only first matching element
db.posts.find(
  { tags: "mongodb" },
  { "tags.$": 1 }  // $ projects only the first matching array element
);

⚠ You Cannot Mix Inclusion and Exclusion

You cannot have both 1s and 0s in the same projection, except for _id. { name: 1, age: 0 } is invalid and throws an error. Pick one mode — include what you want, or exclude what you don't want.

Why Projections Matter for Performance

Projections reduce the amount of data transferred from MongoDB to your application. For a collection with 50 fields per document, fetching only 3 needed fields can cut network traffic by 90%. Combined with an index, a projection that only requests indexed fields creates a covered query — the fastest possible read (no document access needed, answered entirely from the index).

How to answer in an interview

"A projection is the second argument to find() — it controls what fields are returned. You're in either inclusion mode (specify what you want with 1) or exclusion mode (specify what you don't want with 0). You can't mix them except for _id, which can always be suppressed with _id: 0 even in inclusion mode. Projections matter for performance — they reduce network payload and are the key ingredient in covered queries."

What is upsert in MongoDB? When would you use it?

Easy Upsert

▾

What is Upsert?

Upsert is a combined update + insert operation. When { upsert: true } is passed as an option:

If a document matches the filter → it is updated normally.
If no document matches the filter → a new document is inserted using the filter fields as its initial values.

This eliminates the need for a read-then-write pattern, making it atomic and safe under concurrency.

Code Example

MongoDB Shell

// Upsert: update if exists, insert if not
db.pageViews.updateOne(
  { url: "/home" },                          // filter
  { $inc: { views: 1 }, $set: { lastSeen: new Date() } }, // update
  { upsert: true }                             // option
);
// First time: inserts { url: "/home", views: 1, lastSeen: Date }
// Every subsequent call: increments views by 1

// $setOnInsert — set fields ONLY on insert, not on update
db.users.updateOne(
  { email: "alice@example.com" },
  {
    $set: { lastLogin: new Date() },
    $setOnInsert: { createdAt: new Date(), role: "user" }
  },
  { upsert: true }
);

Common Upsert Use Cases

View/hit counters — increment counter, create if first visit
User session tracking — update session, insert if new user
Idempotent data sync — sync data from an external source without duplicates
Shopping cart — update item quantity if in cart, add if not
Config records — ensure a config document always exists with defaults

How to answer in an interview

"Upsert is { upsert: true } on an update call. If the filter matches a document, it updates it. If not, it inserts a new document using the filter fields plus the update payload. It's atomic — no race condition between checking existence and inserting. The companion operator $setOnInsert lets you set fields like createdAt only during the insert path, leaving them untouched on subsequent updates."

What is findOneAndUpdate? How does it differ from updateOne, and when would you use it?

Medium Atomic Operations

▾

The Core Difference

Method	Returns	When to use
updateOne	Result object: `{ matchedCount, modifiedCount }`	When you only care that the update happened — don't need the document back
findOneAndUpdate	The matched document itself (before or after update)	When you need the document AND want the update to happen atomically in one round-trip
findOneAndReplace	The matched document (before or after replace)	Same, but replaces the entire document
findOneAndDelete	The deleted document	Delete and get the document back atomically

Code Example — Before vs After

JavaScript (Node.js Driver)

// Returns the document BEFORE the update (default)
const before = await db.collection('orders').findOneAndUpdate(
  { status: "pending", assignedTo: null },   // filter
  { $set: { assignedTo: "worker-1", status: "processing" } },
  { sort: { createdAt: 1 } }                 // pick oldest first
);

// Returns the document AFTER the update
const after = await db.collection('users').findOneAndUpdate(
  { email: "alice@example.com" },
  { $inc: { loginCount: 1 } },
  { returnDocument: "after", upsert: true }  // returnDocument: "after" returns updated doc
);

The Classic Use Case — Job Queue / Task Claiming

findOneAndUpdate is the backbone of atomic job claiming — grab the next available task and mark it as taken in a single atomic operation, with no risk of two workers claiming the same task:

JavaScript (Node.js)

async function claimNextJob(workerId) {
  return db.collection('jobs').findOneAndUpdate(
    { status: "queued" },
    { $set: { status: "running", workerId, startedAt: new Date() } },
    { sort: { priority: -1, createdAt: 1 }, returnDocument: "after" }
  );
  // If null returned — no jobs available. Otherwise the claimed job.
}

How to answer in an interview

"findOneAndUpdate does what updateOne does but returns the document itself rather than just a result summary — and it does both in a single atomic operation. The key option is returnDocument: 'after' to get the post-update state. The classic use case is a job queue: find the next available job and mark it as claimed in one atomic call, preventing two workers from claiming the same task."

🔄

Segment 3 — Aggregation Pipeline

The aggregation pipeline is MongoDB's most powerful feature for data transformation and analytics. Understanding pipeline stages, stage ordering for performance, and composing real-world multi-stage pipelines is tested in virtually every senior MongoDB interview.

$match / $project $group / Accumulators $lookup $unwind $facet Performance

Questions

Opened

Topics Covered

~50m

Study Time

🧠 Mental Model — The Assembly Line

The aggregation pipeline is exactly what it sounds like: a series of stages that documents flow through, each stage transforming the data before passing it to the next. Think of it as a factory assembly line where raw materials (documents) enter one end and finished goods (results) exit the other.

🏭 The Factory Floor Analogy

Stage 1 ($match) is the loading dock — it only accepts materials that meet your specification, rejecting the rest. Stage 2 ($group) is the sorting machine — groups items by category and counts them. Stage 3 ($sort) is quality control — orders the output. The key insight: each stage gets smaller input than the previous one when you filter early. A factory that pre-sorts 1,000 items is slower than one that first rejects 900 at the loading dock and only sorts 100.

Pipeline Stage Order Matters for Performance

Always put $match and $sort (on indexed fields) as early as possible. Stages that reduce document count ($match) or size ($project) should precede expensive stages ($group, $lookup).

Topic A — Pipeline Fundamentals

What is the MongoDB aggregation pipeline? How does it differ from a simple find() query?

Easy Pipeline Stages

▾

What is the Aggregation Pipeline?

The aggregation pipeline is a framework for data transformation and computation. Documents pass through a sequence of stages, each stage transforming the documents and passing the result to the next stage. It is analogous to Unix pipes: collection | stage1 | stage2 | stage3 → result.

find() vs Aggregation — Key Differences

Capability	find()	Aggregation Pipeline
Filter documents	✅ Yes (query)	✅ Yes ($match)
Shape output	Limited (projection)	✅ Full reshape ($project, $addFields)
Group & compute	❌ No	✅ Yes ($group + accumulators)
Join collections	❌ No	✅ Yes ($lookup)
Compute new fields	❌ No	✅ Yes ($addFields, $project expressions)
Sort & paginate	✅ Yes (sort/skip/limit)	✅ Yes ($sort/$skip/$limit stages)
Multi-pipeline output	❌ No	✅ Yes ($facet)

Core Pipeline Stages — Quick Reference

Stage	Purpose	SQL Analogue
$match	Filter documents	WHERE
$project	Reshape documents (include/exclude/compute fields)	SELECT
$group	Group by key and compute aggregates	GROUP BY
$sort	Sort documents	ORDER BY
$limit	Keep first N documents	LIMIT
$skip	Skip first N documents	OFFSET
$lookup	Left outer join from another collection	LEFT JOIN
$unwind	Deconstruct array — one doc per element	CROSS JOIN UNNEST
$addFields	Add/overwrite fields without replacing the document	SELECT *, new_col
$facet	Run multiple sub-pipelines on same input	No direct analogue

Basic Pipeline Syntax

MongoDB Shell — pipeline syntax

db.orders.aggregate([
  // Stage 1: filter — uses index on (status, createdAt)
  { $match: { status: "completed", createdAt: { $gte: new Date("2024-01-01") } } },
  // Stage 2: group by customer, sum revenue
  { $group: { _id: "$customerId", total: { $sum: "$amount" }, count: { $sum: 1 } } },
  // Stage 3: sort by total descending
  { $sort: { total: -1 } },
  // Stage 4: top 5 customers only
  { $limit: 5 }
]);

How to answer in an interview

"The aggregation pipeline is MongoDB's data transformation engine. Documents flow through a sequence of stages — each stage filters, reshapes, or computes data before passing results to the next. Unlike find(), which only filters and projects, the pipeline can group, join collections with $lookup, compute aggregates, and run multiple parallel sub-pipelines with $facet. It replaced the older MapReduce API, which was deprecated in MongoDB 5.0."

What does $match do? Why must it come first, and what are its performance rules?

Easy $match Performance

▾

What $match Does

$match is the pipeline equivalent of a find() query filter. It accepts any valid MongoDB query predicate and keeps only documents that pass the condition. Operators like $eq, $gt, $in, $regex, and $text all work inside $match.

Why $match Must Be First — Index Usage

When $match is the first stage, MongoDB can use an index to scan only relevant documents, exactly like a find(). Placed after any other stage, MongoDB must process all documents through the preceding stages first — losing index access entirely.

Index Rule

A $match on an indexed field at pipeline position [0] = index scan (fast). A $match at position [2] after $unwind = full scan of the unwound in-memory set with no index benefit.

$match Examples

MongoDB Shell — $match patterns

// Simple equality
{ $match: { status: "active" } }

// Range filter — uses compound index on (status, createdAt)
{ $match: {
  status: "completed",
  createdAt: { $gte: new Date("2024-01-01"), $lt: new Date("2025-01-01") }
} }

// $expr — compare two fields within the same document
{ $match: { $expr: { $gt: ["$revenue", "$cost"] } } }

// Array contains element (uses multikey index)
{ $match: { tags: "featured" } }

// $text search (requires text index)
{ $match: { $text: { $search: "mongodb aggregation" } } }

Adjacent $match Stages — Optimizer Merges Them

MongoDB automatically merges consecutive $match stages into a single match before execution — so splitting complex filters for readability has no performance cost.

MongoDB Shell — coalesced by optimizer

// Two readable stages — optimizer merges into one:
[{ $match: { status: "active"  } },
 { $match: { age:    { $gte: 18 } } }]

// Executed as:
[{ $match: { status: "active", age: { $gte: 18 } } }]

How to answer in an interview

"$match is the pipeline's filter stage — it accepts any query predicate and passes only matching documents downstream. The critical rule: put $match first so MongoDB can use an index at the collection level, just like find(). Placed later, $match operates on in-memory intermediate results with no index benefit. MongoDB's optimizer also merges adjacent $match stages automatically, so you can split complex matches for readability without a performance penalty."

How does $group work? What accumulator operators are available?

Medium $group Accumulators

▾

How $group Works

$group groups input documents by a key (the _id field) and computes aggregate values over each group using accumulator operators. Every unique value of _id produces exactly one output document. Setting _id: null groups all documents into a single group — computing a grand total.

$group Syntax

MongoDB Shell — $group anatomy

{ $group: {
  _id: "<grouping expression>",  // null = single group (grand total)
  field1: { <accumulator>: "<expression>" },
  field2: { <accumulator>: "<expression>" }
} }

// Example: revenue per product category
{ $group: {
  _id:          "$category",
  totalRevenue: { $sum: "$price" },
  avgPrice:     { $avg: "$price" },
  orderCount:   { $sum: 1 },       // count docs in group
  maxPrice:     { $max: "$price" },
  products:     { $push: "$name" }  // collect into array
} }

Accumulator Operators Reference

Accumulator	Description	Example
$sum	Sums numeric values; use 1 to count documents	{ $sum: "$amount" } or { $sum: 1 }
$avg	Arithmetic average of numeric values	{ $avg: "$rating" }
$min	Minimum value in the group	{ $min: "$price" }
$max	Maximum value in the group	{ $max: "$price" }
$push	Collects all values into an array (allows duplicates)	{ $push: "$tag" }
$addToSet	Collects unique values into an array (no duplicates)	{ $addToSet: "$category" }
$first	First value in the group (meaningful after $sort)	{ $first: "$createdAt" }
$last	Last value in the group (meaningful after $sort)	{ $last: "$updatedAt" }
$count	Counts documents in group (MongoDB 5.0+)	{ $count: {} }
$mergeObjects	Merges documents in the group into one object	{ $mergeObjects: "$address" }

Multi-Level Grouping

JavaScript (Node.js) — two-pass grouping

// First group: daily totals per user
// Second group: monthly total from daily buckets
db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: {
    _id: {
      userId: "$userId",
      day: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } }
    },
    dailyTotal: { $sum: "$amount" }
  }},
  { $group: {
    _id:          "$_id.userId",
    monthlyTotal: { $sum: "$dailyTotal" },
    activeDays:   { $sum: 1 }
  }},
  { $sort: { monthlyTotal: -1 } }
]);

Memory Warning

$group is a blocking stage — it must process all input documents before emitting any output. For very large datasets, add { allowDiskUse: true } to spill to disk when the 100 MB in-memory limit is exceeded.

How to answer in an interview

"$group collapses multiple documents into one per unique _id value, applying accumulator operators to compute aggregates per group. The _id field is the grouping key — set it to null to get a single grand total. The most-used accumulators are $sum (sum or count with 1), $avg, $min/$max, $push (array of all values), and $addToSet (unique values only). $group is a blocking stage — all input must be processed before any output is emitted — so use allowDiskUse: true for large aggregations."

Topic B — Reshaping & Joining

What is $project? How do you include, exclude, and compute fields?

Easy $project $addFields

▾

What $project Does

$project is the pipeline's SELECT clause — it controls which fields appear in output documents. It can include fields (set to 1), exclude fields (set to 0), and compute new fields using expressions. Unlike find() projection, aggregation $project can create entirely new fields using arithmetic, string, date, or conditional expressions.

Include, Exclude, and Compute

MongoDB Shell — $project patterns

// Include specific fields (_id included by default)
{ $project: { name: 1, email: 1, createdAt: 1 } }

// Exclude _id (common pattern)
{ $project: { _id: 0, name: 1, email: 1 } }

// Compute a new field — fullName from first + last
{ $project: {
  fullName:  { $concat: ["$firstName", " ", "$lastName"] },
  isAdult:   { $gte: ["$age", 18] },
  discounted:{ $multiply: ["$price", 0.9] }
} }

// Rename a field
{ $project: { userId: "$_id", _id: 0 } }

// Conditional field — $cond (ternary)
{ $project: {
  tier: { $cond: { if: { $gt: ["$spend", 1000] }, then: "gold", else: "standard" } }
} }

$project vs $addFields

Stage	Behaviour	Use When
$project	Only fields explicitly listed survive — everything else is dropped	You want to select specific columns (like SELECT col1, col2)
$addFields	All existing fields pass through AND specified fields are added/overwritten	You want to add computed fields without dropping existing ones
$set	Alias for $addFields (MongoDB 4.2+)	Identical to $addFields — choose for readability

Early $project = Smaller Pipeline

Adding a $project early to strip unused fields reduces the memory footprint of documents flowing into later expensive stages (especially $lookup and $group). If each document shrinks from 5 KB to 500 bytes, a 1M-document pipeline uses 10× less memory mid-stream.

How to answer in an interview

"$project is the pipeline's field selector and reshaper. Set fields to 1 to include, 0 to exclude, or to an expression to compute a new value. Unlike find() projection, it can create entirely new fields using MongoDB expressions — string concat, arithmetic, date math, conditionals. Use $addFields when you want to add computed fields without re-listing every existing field you want to keep."

How does $lookup work? Explain simple and pipeline-based joins with their limitations.

Medium $lookup Joins

▾

What $lookup Does

$lookup performs a left outer join to another collection in the same database. For each input document, it fetches matching documents from the foreign collection and appends them as an array field. If no match exists, the array is empty — but the local document is still included (left outer join semantics).

Form 1 — Simple Equality Lookup

MongoDB Shell — simple $lookup

// Join orders → users on orders.userId = users._id
db.orders.aggregate([
  { $lookup: {
    from:         "users",   // target collection
    localField:   "userId", // field in orders
    foreignField: "_id",    // field in users
    as:           "user"    // output array field
  }},
  // Flatten single-element array → object
  { $unwind: { path: "$user", preserveNullAndEmptyArrays: true } }
]);
// Result: { _id, amount, user: { name, email, ... } }

Form 2 — Pipeline Lookup (MongoDB 3.6+)

The pipeline form lets you filter, project, and transform joined documents before embedding — far more powerful than simple equality:

MongoDB Shell — pipeline $lookup

db.orders.aggregate([
  { $lookup: {
    from: "products",
    let:  { pid: "$productId", minQty: "$quantity" }, // expose local fields
    pipeline: [
      { $match: { $expr: { $and: [
          { $eq:  ["$_id",    "$$pid"]    },
          { $gte: ["$stock", "$$minQty"] }  // only in-stock
      ]}}},
      { $project: { name: 1, price: 1, _id: 0 } }
    ],
    as: "product"
  }}
]);

$lookup Performance Rules

Rule	Why It Matters
Index the foreignField	Without an index, $lookup does a full collection scan for each input doc — O(n×m) complexity
$match before $lookup	Reduce input set first — join fewer documents, not all of them
$project before $lookup	Strip large fields from input docs so joined documents consume less memory
Same database only	$lookup cannot join across databases; use Atlas Data Federation for cross-db joins

$lookup is Not Free

Each $lookup runs a sub-query against the foreign collection per input document. 10,000 input documents = up to 10,000 sub-queries. Always filter before joining and index the foreign join field.

How to answer in an interview

"$lookup performs a left outer join — for each input document, it queries a foreign collection and embeds matching documents as an array. The simple form uses localField/foreignField for equality joins. The pipeline form lets you pass local variables via 'let' and filter/project the joined documents before embedding — much more flexible. Key constraints: both collections must be in the same database, index the foreignField, and $match before $lookup to avoid N×M scan complexity."

What is $unwind? When would you use it and what are its gotchas?

Medium $unwind Arrays

▾

What $unwind Does

$unwind deconstructs an array field — it outputs one document per array element, each with the same fields as the original but with the array field replaced by a single element. An order with 3 line items becomes 3 documents after $unwind: "$items".

Basic Usage

MongoDB Shell — $unwind effect

// Input: { _id: 1, items: ["book", "pen", "notebook"] }

{ $unwind: "$items" }

// Output — 3 documents:
// { _id: 1, items: "book"     }
// { _id: 1, items: "pen"      }
// { _id: 1, items: "notebook" }

// Full options form
{ $unwind: {
  path:                       "$items",
  includeArrayIndex:          "itemIndex",  // adds 0-based position field
  preserveNullAndEmptyArrays: true          // keep doc if array is [] or null
} }

Classic Pattern — $unwind then $group

Unwind normalises an embedded array into rows, then group re-aggregates. Example — count how many products carry each tag:

MongoDB Shell — tag frequency

db.products.aggregate([
  { $unwind: "$tags" },                               // one doc per tag
  { $group:  { _id: "$tags", count: { $sum: 1 } } },  // count per tag
  { $sort:  { count: -1 } },
  { $limit: 10 }
]);

The Three $unwind Gotchas

Gotcha	What Happens	Fix
Empty array []	Document is silently dropped from output	preserveNullAndEmptyArrays: true
Null or missing field	Document is silently dropped from output	preserveNullAndEmptyArrays: true
Document explosion	Array with 1,000 elements = 1,000 output docs; blows up memory	$match early to reduce input; $project to trim array before unwinding

How to answer in an interview

"$unwind deconstructs an array field into separate documents — one per array element. It's how you pivot embedded arrays into rows you can group or filter on. The classic pattern is $unwind then $group: normalize the array, then re-aggregate. Key gotchas: documents with null or empty arrays are silently dropped unless you set preserveNullAndEmptyArrays: true. Also watch for document explosion — a large array multiplies your document count significantly, so $match early to limit the input set before unwinding."

Topic C — Advanced & Performance

What is $facet? How does it run multiple aggregations on the same input in one query?

Hard $facet Multi-Pipeline

▾

What $facet Does

$facet processes input documents through multiple independent sub-pipelines simultaneously, collecting each sub-pipeline's output into a named array field. All sub-pipelines receive the same input — so you get multiple aggregation results from a single collection scan.

Classic Use Case — Search Results with Faceted Navigation

JavaScript (Node.js) — faceted product search

const [result] = await db.collection('products').aggregate([
  // Filter once — same set flows into ALL facets
  { $match: { category: "electronics", inStock: true } },

  { $facet: {
    // Paginated product list
    products: [
      { $sort:    { price: 1 } },
      { $skip:    0 },
      { $limit:   20 },
      { $project: { name: 1, price: 1, thumbnail: 1 } }
    ],
    // Total count for pagination controls
    totalCount: [
      { $count: "total" }
    ],
    // Brand breakdown for sidebar filter
    byBrand: [
      { $group: { _id: "$brand", count: { $sum: 1 } } },
      { $sort:  { count: -1 } }
    ],
    // Price histogram for range slider
    priceRanges: [
      { $bucket: {
        groupBy:    "$price",
        boundaries: [0, 50, 100, 250, 500, 1000],
        default:    "1000+",
        output:     { count: { $sum: 1 } }
      }}
    ]
  }}
]).toArray();

// result.products    → 20 products for page 1
// result.totalCount  → [{ total: 342 }]
// result.byBrand     → [{ _id: "Apple", count: 45 }, ...]
// result.priceRanges → [{ _id: 0, count: 12 }, { _id: 50, count: 38 }, ...]

$facet Limitations

Limitation	Detail
No $out or $merge inside	Sub-pipelines cannot write to collections
No nested $facet	Sub-pipelines cannot themselves contain $facet stages
100 MB memory limit	Each sub-pipeline subject to the limit; use allowDiskUse: true for large inputs
Input buffered once	The matched set is held in memory before fanning out to sub-pipelines

How to answer in an interview

"$facet lets you run multiple independent sub-pipelines on the same input simultaneously. The key benefit: you $match once, then compute a product page + total count + sidebar filters in a single round-trip. Without $facet, that's 3–4 separate queries. The canonical use case is a search results page — one $facet call returns the product list, total for pagination, category breakdown, and price histogram all at once."

How do you optimize aggregation pipeline performance? What are the key rules?

Hard Performance explain()

▾

The 5 Performance Rules

Rule	Why	Example
1. $match first	Enables index use at collection level; reduces docs flowing into later stages	Date/status filters before $group or $lookup
2. $sort early on indexed fields	A $sort on an indexed field before $limit uses a top-K sort; avoids full in-memory sort	{ $sort: { createdAt: -1 } } then { $limit: 100 }
3. $project early to trim docs	Smaller documents = less memory in $lookup and $group stages	Drop unused fields before $lookup
4. Index the $lookup foreignField	Without index, each $lookup does a full foreign collection scan — O(n×m)	Ensure users._id is indexed (it always is — use _id)
5. allowDiskUse for large sets	$group and $sort are limited to 100 MB RAM; disk spill avoids OOM	db.orders.aggregate([...], { allowDiskUse: true })

Automatic Optimizer Coalescences

Optimization	What Happens
Adjacent $match merging	Two consecutive $match stages merged into one
$sort + $limit coalescence	$sort followed by $limit uses a top-K heap (keeps only K docs in memory)
$match pushdown after $lookup	A $match immediately after $lookup may be pushed into the $lookup pipeline to filter joined docs earlier
$project + $match reorder	$match predicates on unmodified fields can be moved before $project

Using explain() on a Pipeline

MongoDB Shell — explain aggregation

db.orders.explain("executionStats").aggregate([
  { $match: { status: "completed", createdAt: { $gte: new Date("2024-01-01") } } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } }
]);

// Key things to look for in output:
// winningPlan.inputStage.stage → "IXSCAN" (good) or "COLLSCAN" (bad)
// executionStats.nReturned vs totalDocsExamined
//   → ratio near 1.0 = efficient; large gap = missing index

The Golden Order — $match → $project → $lookup → $group

Filter the smallest possible set, trim to minimum fields, then join and group. This order consistently gives the best performance for analytics pipelines.

How to answer in an interview

"The most impactful optimisation is $match first so MongoDB uses an index — without this, every stage processes the entire collection. Then project away unused fields before $lookup to reduce memory per document. Index the foreignField used in $lookup or each lookup becomes a full collection scan. For large aggregations, add allowDiskUse: true to prevent 100 MB memory limit errors. Use explain('executionStats') to verify IXSCAN vs COLLSCAN and check nReturned vs totalDocsExamined."

How do you paginate results using the aggregation pipeline? What are the trade-offs?

Medium Pagination $skip / $limit

▾

Method 1 — $skip + $limit (Offset Pagination)

JavaScript (Node.js) — offset pagination

const page     = 3;
const pageSize = 20;

db.collection('products').aggregate([
  { $match:  { category: "books" } },
  { $sort:   { createdAt: -1 } },
  { $skip:   (page - 1) * pageSize },  // skip first (page-1)*pageSize docs
  { $limit:  pageSize }
]);

Method 2 — $facet with $count (Page + Total in One Query)

JavaScript (Node.js) — pagination + total count

const [result] = await db.collection('products').aggregate([
  { $match: { category: "books" } },
  { $sort:  { createdAt: -1 } },
  { $facet: {
    data:  [{ $skip: 40 }, { $limit: 20 }],  // page 3
    total: [{ $count: "count" }]
  }}
]).toArray();

const total      = result.total[0]?.count ?? 0;
const totalPages  = Math.ceil(total / 20);
const data        = result.data;

Method 3 — Cursor / Keyset Pagination (Scalable)

JavaScript (Node.js) — keyset pagination

// Use the last doc's indexed fields as the cursor
db.collection('products').aggregate([
  { $match: {
    category: "books",
    $or: [
      { createdAt: { $lt: lastDate } },
      { createdAt: lastDate, _id: { $lt: lastId } }  // tie-break
    ]
  }},
  { $sort:  { createdAt: -1, _id: -1 } },
  { $limit: 20 }
]);

Comparison

Method	Pros	Cons	Best For
$skip + $limit	Simple; supports random page access	Slow on large offsets — scans and discards skipped docs	Small datasets, admin UIs
$facet + $count	Page + total in one round-trip	Same $skip performance issue; buffers all matched docs	UI requiring total page count
Keyset / Cursor	O(log n) via index; consistent across inserts/deletes	No random page access; more complex query	Infinite scroll, high-volume APIs

How to answer in an interview

"There are three patterns. $skip + $limit is simplest but degrades on large offsets — page 100 at 20 per page means scanning 1,980 docs to discard them. Use $facet to get data and total count in one round-trip for paginated UIs. For high-volume APIs or infinite scroll, cursor-based pagination using a keyset on indexed fields is the only O(log n) approach — it doesn't degrade as the dataset grows and is stable when documents are inserted between pages."

Walk me through a complete real-world pipeline: monthly revenue by category with top products.

Hard Real-World Multi-Stage

▾

The Problem

Given an orders collection where each document has a status, createdAt date, and an embedded items array (productId, name, category, qty, price) — build a report showing: total revenue per category per month, unique order count, average order value, and the top 3 products by revenue in each category.

Sample Document

MongoDB — orders collection shape

{
  _id:        ObjectId("..."),
  status:     "completed",
  createdAt:  new Date("2024-03-15"),
  items: [
    { productId: "p-1", name: "Laptop", category: "Electronics", qty: 1, price: 999 },
    { productId: "p-2", name: "Mouse",  category: "Electronics", qty: 2, price: 29  }
  ]
}

The Complete Pipeline

JavaScript (Node.js) — full analytics pipeline

const report = await db.collection('orders').aggregate([

  // 1. Filter — uses index on (status, createdAt)
  { $match: {
    status:    "completed",
    createdAt: { $gte: new Date("2024-01-01"), $lt: new Date("2025-01-01") }
  }},

  // 2. Extract month from createdAt
  { $addFields: {
    month: { $dateToString: { format: "%Y-%m", date: "$createdAt" } }
  }},

  // 3. Unwind items — one doc per line item
  { $unwind: "$items" },

  // 4. Compute line-item revenue (qty × price)
  { $addFields: {
    "items.revenue": { $multiply: ["$items.qty", "$items.price"] }
  }},

  // 5. Group by month + category
  { $group: {
    _id:          { month: "$month", category: "$items.category" },
    totalRevenue: { $sum:      "$items.revenue" },
    orderIds:     { $addToSet: "$_id" },       // unique orders
    products:     { $push: {
      name: "$items.name", id: "$items.productId", revenue: "$items.revenue"
    }}
  }},

  // 6. Derive counts and averages from grouped data
  { $addFields: {
    orderCount:   { $size: "$orderIds" },
    avgOrderValue: { $divide: ["$totalRevenue", { $size: "$orderIds" }] }
  }},

  // 7. Sort products array and take top 3
  { $addFields: {
    top3Products: {
      $slice: [
        { $sortArray: { input: "$products", sortBy: { revenue: -1 } } },
        3
      ]
    }
  }},

  // 8. Shape the final output
  { $project: {
    _id:           0,
    month:         "$_id.month",
    category:      "$_id.category",
    totalRevenue:  { $round: ["$totalRevenue", 2] },
    orderCount:    1,
    avgOrderValue: 1,
    top3Products:  1
  }},

  // 9. Final sort — chronological, highest revenue first within month
  { $sort: { month: 1, totalRevenue: -1 } }

], { allowDiskUse: true }).toArray();

Stage-by-Stage Explanation

Stage	Input → Output	Why Here
$match	All orders → Q1 completed only	Uses index; reduces volume first
$addFields (month)	Orders → + month string	Extract grouping key from date before unwinding
$unwind	Orders with items[] → one doc per line item	Normalize to compute per-item revenue
$addFields (revenue)	Line items → + revenue field	Compute qty × price before grouping
$group	Line items → one doc per month + category	Aggregate revenue; collect products array
$addFields (counts)	Grouped docs → + orderCount, avgOrderValue	Derive metrics from the orderIds set
$addFields (top3)	Grouped docs → + top3Products	$sortArray + $slice sorts and trims in-place
$project	Enriched docs → clean output shape	Remove internal fields, rename _id compound
$sort	Clean docs → ordered results	Final presentation order

How to answer in an interview

"I'd walk through it stage by stage: $match first with indexed fields, then $addFields to extract the month, then $unwind to explode the items array so I can compute per-line revenue. Then $group by month + category — collecting unique order IDs as a set for accurate counting and $push to collect product data. Then $addFields to derive order count and average from the set. Then $addFields again using $sortArray + $slice to get the top 3 products by revenue. Finally $project to clean the output shape and $sort for presentation. allowDiskUse: true handles the $group stage on large datasets."

⚡

Segment 4 — Indexing & Performance

Indexes are the single biggest lever on MongoDB performance. Knowing which index type to use, how the query planner selects indexes, and how to diagnose slow queries with explain() is essential at every senior and staff engineering interview.

Index Types Compound / ESR Rule Covered Queries explain() TTL / Text / Partial Query Planner

Questions

Opened

Topics Covered

~50m

Study Time

🧠 Mental Model — The Index as a Sorted Bookmark List

An index is a separate B-tree data structure that MongoDB maintains alongside your collection. Think of a book's index at the back: instead of reading every page to find "authentication," you jump straight to page 47. Without an index, MongoDB reads every document (COLLSCAN). With an index, it jumps to the matching entries (IXSCAN) and follows pointers back to the full documents.

📚 The Phone Book Analogy

A collection without indexes is a phone book with names in random order — to find "Smith, John" you read every entry. An index on lastName, firstName is the alphabetically sorted phone book — you jump to S, then to SM, then to Smith in milliseconds. A compound index is a phone book sorted by city first, then by last name — perfect if you always search by city, but useless if you only search by last name.

Indexes Are Not Free

Every index must be updated on every write (insert, update, delete). A collection with 10 indexes pays 10× the write overhead. Index only the fields you actually query — not every field "just in case."

Topic A — Index Types

What types of indexes does MongoDB support? Describe each with its use case.

Easy Index Types

▾

Index Types Reference

Type	Created When	Use Case
Single Field	db.col.createIndex({ field: 1 })	Queries filtering or sorting on one field
Compound	db.col.createIndex({ a: 1, b: -1 })	Queries filtering/sorting on multiple fields together
Multikey	Auto-created when indexed field is an array	Queries on array fields (tags, categories)
Text	db.col.createIndex({ bio: "text" })	Full-text keyword search with stemming and stop-word removal
Geospatial 2dsphere	db.col.createIndex({ loc: "2dsphere" })	Geo queries — $near, $geoWithin on GeoJSON data
Hashed	db.col.createIndex({ _id: "hashed" })	Even distribution for hashed shard keys; equality queries only
TTL	db.col.createIndex({ createdAt: 1 }, { expireAfterSeconds: N })	Auto-delete documents after N seconds (logs, sessions, caches)
Partial	db.col.createIndex({ field: 1 }, { partialFilterExpression: {...} })	Index only a subset of documents — smaller, faster index
Sparse	db.col.createIndex({ field: 1 }, { sparse: true })	Index only documents where the field exists (skip nulls)
Wildcard	db.col.createIndex({ "$**": 1 })	Dynamic/unpredictable fields (CMS content, user-defined attributes)
Clustered	db.createCollection("col", { clusteredIndex: {...} })	Documents stored in _id order on disk — range scans on _id are faster

The Default Index

Every MongoDB collection has a unique index on _id created automatically. It cannot be dropped. ObjectIds encode creation time, so the default _id index also gives you a free chronological sort — db.col.find().sort({ _id: -1 }) returns most-recent-first without a separate createdAt index.

Creating & Managing Indexes

MongoDB Shell — index management

// Create — rolling background build in production
db.users.createIndex({ email: 1 }, { unique: true, name: "email_unique" })

// List all indexes on a collection
db.users.getIndexes()

// Get index sizes
db.users.stats().indexSizes

// Drop a specific index
db.users.dropIndex("email_unique")

// Hide an index (test impact before dropping — MongoDB 4.4+)
db.runCommand({ collMod: "users", index: { name: "email_unique", hidden: true } })

How to answer in an interview

"MongoDB supports single-field, compound, multikey (auto-created for array fields), text, geospatial 2dsphere, hashed, TTL, partial, sparse, and wildcard indexes. The most commonly used in production are compound indexes (to serve complex queries), TTL for automatic data expiry, partial indexes to index only a relevant subset, and text indexes for keyword search. Every collection gets a unique _id index automatically. In Atlas, Atlas Search provides far more powerful full-text capabilities than the native text index."

How do compound indexes work? What is the ESR rule and prefix rule?

Medium Compound Index ESR Rule

▾

What a Compound Index Is

A compound index indexes multiple fields together in a single B-tree, sorted by the first field, then within ties sorted by the second field, and so on. One compound index can serve queries that filter on any left-anchored prefix of the indexed fields — this is the prefix rule.

The Prefix Rule

MongoDB Shell — prefix rule

// Index: { status: 1, customerId: 1, createdAt: -1 }
db.orders.createIndex({ status: 1, customerId: 1, createdAt: -1 })

// ✅ Uses index — matches prefix (status)
db.orders.find({ status: "completed" })

// ✅ Uses index — matches prefix (status, customerId)
db.orders.find({ status: "completed", customerId: "c-1" })

// ✅ Uses index — full prefix match
db.orders.find({ status: "completed", customerId: "c-1" }).sort({ createdAt: -1 })

// ❌ Does NOT use index — skips status (first field)
db.orders.find({ customerId: "c-1" })

The ESR Rule — Optimal Field Ordering

When designing compound indexes, follow ESR: Equality → Sort → Range. Fields with equality predicates come first (highest selectivity, most nodes eliminated), followed by sort fields (to avoid in-memory sort), followed by range predicates (last because they scan a range of values).

MongoDB Shell — ESR ordering example

// Query: { status: "active", createdAt: { $gte: lastWeek } }, sort: { score: -1 }

// ❌ Wrong order — range before sort
db.users.createIndex({ status: 1, createdAt: 1, score: -1 })

// ✅ ESR order — equality(status), sort(score), range(createdAt)
db.users.createIndex({ status: 1, score: -1, createdAt: 1 })
// Result: index provides sort order (no in-memory sort) and covers range

Index Direction in Compound Indexes

For a single-field index, direction (1 vs -1) doesn't matter — MongoDB can traverse a B-tree in either direction. For compound indexes, direction matters when the combination of sort orders doesn't match the index direction. { a: 1, b: 1 } supports sort({ a: 1, b: 1 }) and sort({ a: -1, b: -1 }) but NOT sort({ a: 1, b: -1 }).

How to answer in an interview

"A compound index stores multiple fields in a single B-tree sorted left-to-right. The prefix rule means any query that matches the leftmost fields can use the index — but skipping the first field makes the index useless. When designing field order, follow the ESR rule: put equality predicates first (most selective, most nodes eliminated), then sort fields (avoids expensive in-memory sort), then range predicates last. This ordering lets the index both filter AND provide sort order for free."

What is a covered query? How do you create one and verify it with explain()?

Medium Covered Query explain()

▾

What a Covered Query Is

A covered query is one where MongoDB can satisfy the query entirely from the index — without fetching the actual documents from disk. This is the fastest possible query because it never touches the collection; it reads only the index B-tree. A query is covered when every field in the query filter and projection exists in the index.

Creating a Covered Query

MongoDB Shell — covered query setup

// Index contains: status, customerId, amount
db.orders.createIndex({ status: 1, customerId: 1, amount: 1 })

// ✅ COVERED — filter and projection only use indexed fields
//            _id must be excluded (it's not in this index)
db.orders.find(
  { status: "completed", customerId: "c-1" },  // filter
  { amount: 1, _id: 0 }                        // projection — exclude _id!
)

// ❌ NOT covered — requests a field not in the index
db.orders.find(
  { status: "completed" },
  { amount: 1, createdAt: 1, _id: 0 }  // createdAt not in index → fetch doc
)

Verifying Coverage with explain()

MongoDB Shell — verify covered query

db.orders.find(
  { status: "completed", customerId: "c-1" },
  { amount: 1, _id: 0 }
).explain("executionStats")

// Look for in executionStats:
// totalDocsExamined: 0   ← key indicator — no documents fetched
// totalKeysExamined: N   ← only index keys scanned
// stage: "PROJECTION_COVERED"  ← confirms covered execution

// If totalDocsExamined > 0 → NOT covered → check if projection
// requests any field outside the index (including _id by default)

The _id Gotcha

_id is returned by default in every find(). If _id is not in your compound index, the query cannot be covered unless you explicitly exclude it with { _id: 0 }. This is the most common reason a query that looks covered is actually not.

How to answer in an interview

"A covered query is one MongoDB satisfies entirely from the index, without touching the underlying documents — totalDocsExamined is 0 in explain(). To create one, every field in the filter and projection must be in the index. The classic gotcha is _id: MongoDB returns _id by default, and if _id isn't in your compound index, you must explicitly project it out with { _id: 0 } to achieve coverage. Covered queries are dramatically faster because they avoid disk I/O to the collection entirely."

How do you use explain() to analyze query performance? What are the key fields to inspect?

Medium explain() Query Planner

▾

Three explain() Verbosity Modes

Mode	Returns	Use When
queryPlanner (default)	Winning plan chosen by the planner — does NOT run the query	Fast check: which index would be used?
executionStats	Winning plan + actual execution metrics (docs examined, time)	Diagnose slow queries in dev/staging
allPlansExecution	All candidate plans + their execution stats	Deep analysis of why the planner chose plan A over plan B

Key Fields to Inspect

MongoDB Shell — reading explain output

const exp = db.orders.find({ status: "completed" }).explain("executionStats")

// ── queryPlanner section ──
exp.queryPlanner.winningPlan.stage
// "COLLSCAN" → no index used (bad for large collections)
// "IXSCAN"   → index scan (good)
// "FETCH"    → fetched docs from collection after IXSCAN
// "PROJECTION_COVERED" → served entirely from index (best)
// "SORT"     → in-memory sort (potential performance issue)

exp.queryPlanner.winningPlan.inputStage.indexName
// Which index was selected

// ── executionStats section ──
exp.executionStats.nReturned          // docs returned to caller
exp.executionStats.totalDocsExamined   // docs fetched from collection
exp.executionStats.totalKeysExamined   // index entries scanned
exp.executionStats.executionTimeMillis // total execution time

// Efficiency ratios to check:
// nReturned / totalDocsExamined → close to 1.0 = efficient
// nReturned / totalKeysExamined → close to 1.0 = good index selectivity
// totalDocsExamined = 0         → covered query (best possible)

Diagnosing Common Problems from explain()

Symptom in explain()	Root Cause	Fix
stage: COLLSCAN	No usable index for this query	Create an index matching the filter fields
totalDocsExamined >> nReturned	Index has low selectivity — scanning many docs to return few	Add more fields to the index to narrow the scan
stage: SORT (in-memory)	Sort field not in index or index direction mismatch	Add sort field to compound index (ESR rule)
stage: FETCH after IXSCAN	Projection requests fields not in the index	Add those fields to index for coverage, or accept the fetch
executionTimeMillis high despite IXSCAN	Index is large, or document fetch is slow (large docs)	$project to reduce doc size; consider covered query

Hint Index — Force the Planner

If the query planner picks the wrong index, use .hint({ field: 1 }) to force a specific index, then re-run explain() to compare execution stats. Never use hint in production code without measuring — the planner usually knows best.

How to answer in an interview

"explain('executionStats') is the primary tool. The two most important things to check are: the stage — IXSCAN is good, COLLSCAN means no index was used, and SORT means an expensive in-memory sort is happening. Then check the ratio of nReturned to totalDocsExamined — if you examined 50,000 docs to return 10, your index is poorly selective or the wrong fields are indexed. totalDocsExamined: 0 means a covered query, which is the best possible outcome."

Topic B — Specialised Index Types

What are TTL indexes? How do they work and what are their limitations?

Easy TTL Index Auto-Delete

▾

What TTL Indexes Do

A TTL (Time-To-Live) index automatically deletes documents after a specified number of seconds from a Date field. MongoDB's background TTL monitor thread checks the index every 60 seconds and removes expired documents. This is the native way to implement session expiry, log rotation, and cache invalidation without application-level cron jobs.

Creating TTL Indexes

MongoDB Shell — TTL index patterns

// Delete sessions 24 hours after createdAt
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 86400 }  // 24 × 60 × 60
)

// Delete at a specific time — set expiresAt to the exact expiry date
// expireAfterSeconds: 0 means "expire at the date stored in the field"
db.tokens.createIndex(
  { expiresAt: 1 },
  { expireAfterSeconds: 0 }
)
// Document: { token: "abc", expiresAt: new Date("2024-12-31") }
// → deleted after Dec 31 2024

// Change TTL on existing index (no need to drop/recreate)
db.runCommand({
  collMod: "sessions",
  index: { keyPattern: { createdAt: 1 }, expireAfterSeconds: 3600 }
})

TTL Limitations

Limitation	Detail
~60s precision	The TTL monitor runs every 60 seconds — documents expire within ~60s of their TTL, not exactly
Single Date field only	The indexed field must be a Date type or an array of Dates; other types are ignored
Not on compound indexes	TTL indexes must be single-field — you cannot create a compound TTL index
Not on capped collections	Capped collections do not support TTL indexes
High delete load	On very large collections, TTL deletes can create write pressure; monitor with db.currentOp()

How to answer in an interview

"A TTL index automatically deletes documents after N seconds from a Date field. It's built on a background thread that runs every ~60 seconds, so expiry is approximate not exact. The two patterns are: fixed duration (expireAfterSeconds: 86400 with a createdAt field) and exact expiry time (expireAfterSeconds: 0 with an expiresAt field set to the specific deletion date). Typical uses are session stores, password reset tokens, event logs, and cache documents. Key limitation: TTL must be a single-field index — you can't combine it with a compound index."

What is a multikey index? When is it created and what are its limitations?

Medium Multikey Index Arrays

▾

What a Multikey Index Is

MongoDB automatically creates a multikey index when an indexed field contains an array. Instead of storing one index entry per document, it creates one entry per array element — so a document with a tags: ["mongodb", "nosql", "database"] field generates 3 index entries. This allows efficient querying on array contents.

Multikey Index in Action

MongoDB Shell — multikey index

// Create index on tags field
db.products.createIndex({ tags: 1 })
// If any document has tags: ["a", "b"], this becomes a multikey index

// Documents:
// { _id: 1, tags: ["mongodb", "nosql"] }
// { _id: 2, tags: ["mongodb", "atlas"] }

// Index entries created:
// "atlas"   → doc 2
// "mongodb" → doc 1, doc 2
// "nosql"   → doc 1

// These queries all use the multikey index:
db.products.find({ tags: "mongodb" })              // ✅ element match
db.products.find({ tags: { $in: ["atlas", "nosql"] } })  // ✅ $in
db.products.find({ tags: { $all: ["mongodb", "nosql"] } }) // ✅ $all

Multikey Limitations

Limitation	Detail
Only one array field per compound index	A compound index { a: 1, b: 1 } cannot be multikey on both a AND b — only one of them can be an array
Cannot be used as shard key	Multikey indexes cannot be used as the shard key in a sharded cluster
Cannot be covered queries	Covered queries are not supported for multikey indexes
Index size grows with array size	A document with 1,000 array elements creates 1,000 index entries — watch for index bloat

Detecting a Multikey Index

Run db.collection.getIndexes() and look for "isMultiKey": true in the output, or check the multiKeyPaths field to see which array field triggered the multikey state.

How to answer in an interview

"A multikey index is automatically created when you index an array field — MongoDB creates one index entry per array element rather than one per document. This makes queries like { tags: 'mongodb' } efficient even though tags is an array. The key limitation is that a compound index can only have one multikey (array) field — you can't have two array fields in the same compound index. Also watch for index bloat on documents with very large arrays, since each element becomes a separate index entry."

What is a partial index? When would you use it instead of a standard index?

Medium Partial Index Sparse Index

▾

What a Partial Index Is

A partial index only indexes documents that match a specified partialFilterExpression. Documents that don't match the filter are excluded from the index entirely. This means the index is smaller, cheaper to build, faster to query, and uses less RAM than a full index on the same field.

Classic Use Cases

MongoDB Shell — partial index examples
// Use case 1: Index only active users (hot data subset)
// 10% of users are active — index only covers them, 10× smaller
db.users.createIndex(
  { email: 1 },
  { partialFilterExpression: { status: "active" } }
)
// Query MUST include the filter expression to use the index:
db.users.find({ email: "alice@example.com", status: "active" }) // ✅ uses index
db.users.find({ email: "alice@example.com" })                   // ❌ COLLSCAN

// Use case 2: Sparse unique constraint — allow multiple docs with no email
// (sparse: true is a simpler alias for this specific pattern)
db.users.createIndex(
  { email: 1 },
  { unique: true, partialFilterExpression: { email: { $exists: true } } }
)
// Multiple docs with no email field can coexist; email uniqueness only
// enforced where the field exists

// Use case 3: Index completed orders only (most queries target completed)
db.orders.createIndex(
  { customerId: 1, createdAt: -1 },
  { partialFilterExpression: { status: "completed" } }
)
        

Partial vs Sparse — Key Difference

Type	Filters Out	Filter Expression
Sparse	Documents where the indexed field is absent or null	Implicit (no expression needed)
Partial	Documents that don't match any arbitrary expression	Explicit partialFilterExpression — much more flexible

Partial indexes are strictly more powerful than sparse indexes. The MongoDB docs recommend using partial indexes over sparse indexes for new code — you can replicate sparse behaviour with { $exists: true }.

How to answer in an interview

"A partial index only indexes documents matching a filter expression, skipping the rest. This makes it dramatically smaller and faster than a full index on the same field when your queries only ever target a subset of documents — like 'active users only' or 'completed orders only.' The catch: queries must include the filter expression in their predicate, otherwise MongoDB can't use the partial index and falls back to COLLSCAN. The most practical use case is indexing hot data — if only 10% of documents are 'active', the partial index is 10× smaller and fits more easily in RAM."

Topic C — Query Planner & Production Indexing

How does the MongoDB query planner select indexes? What is index intersection?

Hard Query Planner Index Intersection

▾

How the Query Planner Works

When MongoDB receives a query, the query planner evaluates candidate indexes. It runs trial executions of each candidate plan in parallel, scoring them by how many documents they examine. The plan that returns results fastest (least work) wins and is cached as the winning plan. The cache is invalidated when the collection grows significantly, indexes are added/dropped, or mongod restarts.

Plan Cache — How It Works

MongoDB Shell — plan cache management
// View cached plans for a collection
db.orders.getPlanCache().list()

// Clear the plan cache (forces re-evaluation)
db.orders.getPlanCache().clear()

// Force a specific index (overrides planner — use for diagnosis only)
db.orders.find({ status: "completed" }).hint({ status: 1, createdAt: -1 })
        

Index Intersection

Index intersection allows MongoDB to combine two separate single-field indexes to satisfy a query that filters on both fields — instead of requiring a compound index. The engine performs an IXSCAN on each index separately, then intersects the document ID sets.

MongoDB Shell — when intersection happens
// Two separate indexes:
db.orders.createIndex({ status: 1 })
db.orders.createIndex({ customerId: 1 })

// Query on both — planner MAY use AND_SORTED (index intersection)
db.orders.find({ status: "completed", customerId: "c-1" })

// explain() would show: stage: "AND_SORTED" or "AND_HASH"
// In practice, a compound index { status: 1, customerId: 1 } is
// almost always faster than intersection — prefer compound indexes.
        

Planner Decision — Compound vs Intersection

Approach	Pros	Cons
Compound index	Single IXSCAN; predictable; smaller RAM footprint; supports sorting	Must redesign if query patterns change; uses more index space upfront
Index intersection	Two small single-field indexes can serve many query shapes	Two IXSCAN operations + set intersection = more work; cannot provide sort order; planner may not choose it

How to answer in an interview

"The query planner runs candidate plans in parallel, measures which returns results with the least work, caches the winner, and uses that plan until the cache is invalidated. Index intersection lets MongoDB combine two single-field indexes to satisfy a multi-field query by intersecting their document ID sets — useful when you can't predict all query patterns. But in practice, a purpose-built compound index is almost always faster than intersection because it's a single B-tree traversal vs two scans plus an intersection step. Use intersection as a fallback, not a design strategy."

What are text indexes and $text search? How do they compare to Atlas Search?

Medium Text Index Full-Text Search

▾

Native Text Index — How It Works

A text index tokenises string fields, applies stemming (runs → run) and stop-word removal (the, a, is), and stores terms in a B-tree with relevance scores (tfidf). Queries use the $text operator with $search. Only one text index per collection is allowed, but it can cover multiple fields.

Creating and Using Text Indexes

MongoDB Shell — text index

// Create text index on title and body fields
db.articles.createIndex({ title: "text", body: "text" })

// Weight fields — title matches count 3× more than body matches
db.articles.createIndex(
  { title: "text", body: "text" },
  { weights: { title: 3, body: 1 } }
)

// Basic search
db.articles.find({ $text: { $search: "mongodb aggregation" } })

// Search with relevance score sorted
db.articles.find(
  { $text: { $search: "mongodb aggregation" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

// Exact phrase (wrap in quotes)
db.articles.find({ $text: { $search: '"aggregation pipeline"' } })

// Exclude a word
db.articles.find({ $text: { $search: "mongodb -mapreduce" } })

Native Text Index vs Atlas Search

Feature	Native Text Index	Atlas Search (Lucene)
Engine	MongoDB B-tree with tfidf scoring	Apache Lucene (industry standard)
Relevance ranking	Basic tfidf	BM25 + custom scoring / boosting
Fuzzy search	❌ No	✅ Yes (typo tolerance)
Autocomplete	❌ No	✅ Yes
Faceted search	❌ No	✅ Yes
Synonyms	❌ No	✅ Yes
Index per collection	1 text index only	Multiple search indexes
Infrastructure	In-process (same mongod)	Separate search nodes (Atlas only)

When to Use Each

Use the native text index for simple keyword search on small-to-medium collections where you can't use Atlas. Use Atlas Search for production applications that need fuzzy matching, autocomplete, faceted navigation, or relevance tuning.

How to answer in an interview

"Native text indexes support keyword search with stemming, stop-word removal, and basic tfidf relevance scoring. They're limited to one index per collection and lack fuzzy search, autocomplete, and faceting. Atlas Search — built on Apache Lucene — provides production-grade full-text search with fuzzy matching, BM25 scoring, autocomplete, synonyms, and faceted navigation. For a real product search or content search feature, I'd choose Atlas Search. The native text index is fine for simple admin search or internal tooling where Atlas is not available."

How do you design indexes for a high-write, high-read production system? What are the trade-offs?

Hard Production Write Overhead

▾

The Core Trade-off

Every index speeds up reads but slows down writes. Each write operation (insert, update, delete) must update every index on the collection. A collection with 10 indexes pays 10× the index write overhead. In high-write systems, over-indexing is as dangerous as under-indexing.

Design Principles for Production Indexing

Principle	Detail
Index based on access patterns	Analyse your actual queries (slow query log, $currentOp, Profiler) before creating indexes — don't guess
Prefer compound over multiple singles	One compound index { a, b, c } is cheaper than three single-field indexes and serves more query shapes via prefixes
Remove unused indexes	Use db.collection.aggregate([ { $indexStats: {} } ]) to find indexes with ops: 0 and drop them
Build indexes in rolling fashion	On Atlas/replica sets, create indexes with createIndex() — MongoDB builds on secondaries first, then primary; no lock
Use partial indexes for hot data	If 90% of queries target "active" status, index only active docs — much smaller, faster, less write overhead
Monitor index cache hit rate	Indexes must fit in the WiredTiger cache (default: 50% of RAM). If indexes exceed RAM, MongoDB pages them in/out (slow)

Finding and Removing Unused Indexes

MongoDB Shell — index usage stats

// See how many times each index has been used since last restart
db.orders.aggregate([{ $indexStats: {} }])
// Output: { name, key, accesses: { ops, since } }
// ops: 0 → index never used → candidate for removal

// Check index sizes (are they fitting in RAM?)
db.orders.stats({ indexDetails: true }).indexSizes

// Hide index before dropping (MongoDB 4.4+ — safer than direct drop)
// Hidden indexes are maintained but not used by the planner
// Monitor for 1–2 weeks, then drop if no performance regression
db.runCommand({ collMod: "orders", index: { name: "old_index", hidden: true } })

// Enable the query profiler to find slow queries (level 2 = all queries)
db.setProfilingLevel(1, { slowms: 100 })  // log queries > 100ms
db.system.profile.find().sort({ millis: -1 }).limit(5)

Indexes and WiredTiger Cache

MongoDB's WiredTiger storage engine keeps frequently accessed indexes in memory (default: 50% of RAM). When the working set of indexes exceeds available RAM, MongoDB page-faults — fetching index pages from disk on each query. Symptoms: high page faults/s in mongostat, rising query latency despite IXSCAN plans. Fix: add RAM, reduce index count, or use partial indexes to shrink index size.

How to answer in an interview

"Index design is a balancing act: more indexes speed up reads but slow down every write and consume RAM. My approach is: start with the query profiler and slow query log to understand actual access patterns, then build compound indexes following ESR ordering to serve the most common queries. I use $indexStats to find zero-use indexes and hide them for 1–2 weeks before dropping, to avoid surprise regressions. For high-write collections, I keep the index count minimal — 3–5 max — and use partial indexes to shrink hot-path indexes. Everything must fit in the WiredTiger cache or you're just doing disk I/O with extra steps."

🏗️

Segment 5 — Schema Design & Data Modeling

MongoDB's greatest strength is flexible schema design — and its biggest footgun. Knowing when to embed vs reference, how to model 1:N and N:N relationships, and when to apply advanced patterns like Bucket, Extended Reference, and Outlier separates architects from developers.

Embed vs Reference 1:N / N:N Patterns Bucket Pattern Extended Reference Schema Validation Anti-Patterns

Questions

Opened

Topics Covered

~55m

Study Time

🧠 Mental Model — Design for Your Read Patterns

In relational databases you model data to third normal form (3NF) and let JOINs reconstruct it at query time. In MongoDB you model data for how your application reads it. The central question is not "how is this data related?" but "what does my application need to read together, and how often?"

📦 The Packing Analogy

Embedding is like packing a suitcase for one trip — everything you need is in one place, no fetching required. Referencing is like storing items in a warehouse with a catalogue — you look up the catalogue first, then fetch each item separately. The right choice depends on how often you travel and whether you always need all items together, or just some of them.

The Three Schema Design Questions

1. What data do you read together? → Embed it.
2. How often does the data change? → Frequently-changing data may be better referenced.
3. What is the cardinality? → Unbounded arrays inside a document are a red flag (16 MB limit).

Topic A — Embedding vs Referencing

When should you embed documents vs use references? What are the deciding factors?

Medium Embed vs Reference Data Modeling

▾

Embedding vs Referencing — Decision Table

Factor	Favour Embedding	Favour Referencing
Read together?	Data is almost always read together (order + items)	Data is often read independently (user profile vs orders)
Cardinality	One-to-few (address, phone numbers)	One-to-many or many-to-many (user → orders)
Update frequency	Sub-document rarely changes	Sub-document changes often and is shared across many parents
Duplication OK?	Some duplication is acceptable for read speed	Single source of truth is required
Document size	Embedded data stays well under 16 MB limit	Array could grow unbounded (tweets, log entries)
Atomic update	Need to update parent + child atomically (pre-4.0)	Multi-document transactions available (4.0+)

Embedding Example — Order with Line Items

MongoDB — embedded pattern

// ✅ Good embed: line items always read with the order
{
  _id: ObjectId("..."),
  customerId: "u-123",
  status: "completed",
  createdAt: new Date(),
  shippingAddress: {          // ✅ embed — always read with order
    street: "123 Main St",
    city: "Austin", zip: "78701"
  },
  items: [                     // ✅ embed — bounded, read together
    { productId: "p-1", name: "Laptop", qty: 1, price: 999 },
    { productId: "p-2", name: "Mouse",  qty: 2, price: 29  }
  ]
  // customerId is a REFERENCE — user profile read separately
}

Referencing Example — User → Orders

MongoDB — reference pattern

// users collection
{ _id: "u-123", name: "Alice", email: "alice@example.com" }

// orders collection — references user by ID
{ _id: ObjectId("..."), customerId: "u-123", amount: 1057 }

// ✅ Why reference works here:
//  - User profile updated independently of orders
//  - One user can have thousands of orders (unbounded)
//  - Order list page and user profile page read separately

How to answer in an interview

"The rule of thumb is: embed data that is read together and updated together; reference data that is read independently, changes frequently, or could grow unbounded. The 16 MB document limit is the hard constraint — any array that could grow without bound (all tweets from a user, all events in a log) must be referenced. For one-to-few relationships with bounded data (order line items, shipping address), embedding almost always wins because it eliminates a round-trip."

How do you model one-to-many (1:N) relationships in MongoDB? Show all three approaches.

Medium 1:N Relationship Data Modeling

▾

The Three Approaches to 1:N

MongoDB offers three ways to model one-to-many. The cardinality of the relationship (one-to-few, one-to-many, one-to-squillions) determines which to use.

Approach 1 — Embed Array (One-to-Few)

MongoDB — embed array (best for one-to-few)

// Person → phone numbers (always < 5, read together)
{
  _id: "u-1",
  name: "Alice",
  phones: [
    { type: "mobile", number: "555-0101" },
    { type: "work",   number: "555-0102" }
  ]
}
// ✅ Single read, atomic update, no join needed
// ❌ Cannot query phones independently

Approach 2 — Array of References (One-to-Many)

MongoDB — array of references (one-to-many)

// Publisher → books (dozens of books per publisher)
// publishers collection
{ _id: "pub-1", name: "O'Reilly", bookIds: ["b-1", "b-2", "b-3"] }

// books collection
{ _id: "b-1", title: "MongoDB: The Definitive Guide", publisherId: "pub-1" }

// ✅ Books queried independently; publisher list clean
// ❌ Need $lookup or two round-trips to get full book data
// ❌ bookIds array grows — risky if publisher has thousands of books

Approach 3 — Child Reference (One-to-Squillions)

MongoDB — child reference (unbounded)

// Host → log events (millions of events per host)
// hosts collection
{ _id: "host-1", hostname: "web-01", ip: "10.0.0.1" }

// logs collection — each log references its host
{ _id: ObjectId("..."), hostId: "host-1", level: "error", msg: "...", ts: new Date() }

// ✅ Scales to unlimited events per host
// ✅ Each log queryable and indexable independently
// ❌ Joining host info requires $lookup

Choosing the Right Approach

Cardinality	Pattern	Example
One-to-few (< ~20)	Embed array in parent	Person → phones, order → line items
One-to-many (dozens to hundreds)	Array of references in parent OR child reference	Publisher → books, user → orders
One-to-squillions (millions)	Child reference — store parent ID in child	Host → log events, user → activity feed

How to answer in an interview

"MongoDB gives three options for 1:N. For one-to-few (bounded, always read together), embed the array directly in the parent. For one-to-many (dozens), store an array of reference IDs on the parent — but this can bloat the parent document. For one-to-squillions (millions of children), flip it: store the parent's ID on each child document. The child-reference approach scales without bound and keeps the parent document small — essential for log events, activity feeds, or sensor readings."

How do you model many-to-many (N:N) relationships in MongoDB?

Medium N:N Relationship Data Modeling

▾

N:N — No Junction Table Needed

In relational databases, N:N requires a junction table. In MongoDB the most common approach is to store arrays of references on both sides — or just one side depending on which direction you query most. There's also a denormalised embed approach for small, stable relationship data.

Approach 1 — References on Both Sides

MongoDB — bidirectional references
// Students ↔ Courses (students enrol in many courses, courses have many students)

// students collection
{
  _id: "s-1",
  name: "Alice",
  courseIds: ["c-1", "c-2", "c-3"]  // which courses this student takes
}

// courses collection
{
  _id: "c-1",
  title: "MongoDB Mastery",
  studentIds: ["s-1", "s-2", "s-3"]  // which students are enrolled
}

// ✅ Find all courses for a student: index courseIds
db.students.find({ _id: "s-1" })  // get courseIds, then fetch courses

// ✅ Find all students in a course: index studentIds
db.courses.find({ _id: "c-1" })   // get studentIds, then fetch students

// ❌ Must update BOTH documents when a student enrols
        

Approach 2 — Reference on One Side Only

MongoDB — unidirectional reference

// Products ↔ Tags — query "all products with tag X" most often
// Store tag IDs on the product; tags collection stays clean

// products collection
{ _id: "p-1", name: "Laptop", tagIds: ["tag-nosql", "tag-hardware"] }

// tags collection
{ _id: "tag-nosql", label: "NoSQL", colour: "green" }

// ✅ "Products with tag X" → db.products.find({ tagIds: "tag-nosql" })
// ✅ Only one document to update on enrol
// ❌ "All tags for product X" needs $lookup or two queries

Approach 3 — Denormalised Embed with Key Attributes

MongoDB — partial embed (Extended Reference pattern)

// Orders ↔ Products — embed just the fields you display
{
  _id: ObjectId("..."),
  items: [
    {
      productId: "p-1",         // reference (for lookups)
      name: "Laptop",           // snapshot of name at purchase time
      price: 999               // snapshot of price at purchase time
    }
  ]
}
// ✅ Order page renders without any join
// ✅ Price is preserved as it was at purchase — correct
// ❌ Product name change doesn't update old orders — by design

How to answer in an interview

"MongoDB handles N:N without a junction table. The most common approach is referencing: store IDs on one side (or both sides) with a multikey index to query from either direction. If you query only one direction, put the array on the 'many' side you query most. For order line items, I use a partial embed — store both the reference ID and a snapshot of key fields (name, price) at the time of the relationship. This avoids joins at read time and correctly preserves historical data — an order should show the price when it was placed, not today's price."

Topic B — Advanced Schema Patterns

What is the Bucket Pattern? When and how do you use it?

Hard Bucket Pattern Time Series

▾

The Problem It Solves

When you have a high-frequency data stream (IoT sensors, stock ticks, metrics), storing one document per event creates millions of tiny documents — high index overhead, many B-tree nodes, slow range queries. The Bucket Pattern groups a fixed number of events into a single document, trading document count for document size.

One Document Per Event (Anti-Pattern for High Frequency)

MongoDB — naive approach (avoid at scale)

// 1 document per sensor reading — 1M readings/day = 1M tiny docs
{ sensorId: "s-1", ts: new Date(), temp: 22.3, humidity: 45 }
{ sensorId: "s-1", ts: new Date(), temp: 22.4, humidity: 46 }
// Problem: huge index overhead, slow hourly aggregations

Bucket Pattern — Group Events into Documents

MongoDB — bucket pattern (one doc per hour per sensor)

// One bucket = one hour of readings for one sensor
{
  _id: "s-1_2024-03-15T14",     // sensorId + hour as natural key
  sensorId: "s-1",
  hour: new Date("2024-03-15T14:00:00Z"),
  count: 60,                    // number of readings in this bucket
  sum: { temp: 1342, humidity: 2730 },  // running totals for avg
  min: { temp: 21.8, humidity: 44   },
  max: { temp: 23.1, humidity: 48   },
  readings: [                   // raw data — up to 60 per bucket
    { ts: new Date("2024-03-15T14:00:00Z"), temp: 22.3, humidity: 45 },
    { ts: new Date("2024-03-15T14:01:00Z"), temp: 22.4, humidity: 46 }
  ]
}

// Insert a new reading — $push + update running totals atomically
db.sensors.updateOne(
  { _id: "s-1_2024-03-15T14", count: { $lt: 60 } },
  {
    $push: { readings: { ts: new Date(), temp: 22.5, humidity: 47 } },
    $inc:  { count: 1, "sum.temp": 22.5, "sum.humidity": 47 },
    $min:  { "min.temp": 22.5 },
    $max:  { "max.temp": 22.5 }
  },
  { upsert: true }
)

Benefits & Trade-offs

Aspect	Naive (One Doc/Event)	Bucket Pattern
Document count	1M docs / day / sensor	~24 docs / day / sensor (1 per hour)
Index size	Huge — one entry per event	Small — one entry per bucket
Hourly average query	$group over 60 docs	sum.temp / count — single doc read
Write pattern	Simple insert	updateOne with upsert — slightly complex
Raw data access	Direct	Requires $unwind or array indexing

MongoDB Time Series Collections (5.0+)

MongoDB 5.0 introduced native Time Series collections which implement the bucket pattern automatically — no manual bucket management needed. For new IoT/metrics projects, prefer db.createCollection("sensors", { timeseries: { timeField: "ts", metaField: "sensorId" } }) over manual bucketing.

How to answer in an interview

"The Bucket Pattern solves the problem of storing high-frequency time series data. Instead of one document per event (millions of tiny docs with massive index overhead), you group N events into a single bucket document — say, one bucket per sensor per hour. The bucket stores raw readings as an array plus pre-computed running aggregates (sum, min, max, count) so hourly averages are a single document read rather than a $group over 60 documents. MongoDB 5.0+ Time Series collections implement this pattern natively, so for new projects I'd use those instead of manual bucketing."

What is the Extended Reference Pattern? When does it outperform $lookup?

Hard Extended Reference Denormalization

▾

What It Is

The Extended Reference Pattern is a deliberate, controlled form of denormalisation. Instead of storing just a foreign key (reference), you embed a snapshot of the most-read fields from the referenced document alongside the ID. This eliminates the $lookup on the hot read path while keeping a pointer to the source of truth for anything else.

Pure Reference vs Extended Reference

MongoDB — pure reference vs extended reference

// ❌ Pure reference — order list page needs JOIN to show customer name
{
  _id: ObjectId("..."),
  customerId: "u-123",      // reference only
  amount: 1057
}

// ✅ Extended reference — order list page renders without any join
{
  _id: ObjectId("..."),
  customer: {
    _id:   "u-123",          // keep the reference
    name:  "Alice Smith",    // snapshot — displayed on every order
    email: "alice@example.com" // snapshot — used for notifications
    // NOT: address, preferences, etc. — only what THIS doc needs
  },
  amount: 1057,
  status: "completed"
}

// When customer's name changes — update only recent/open orders
// Historical orders can keep the old name (which may be intentional)

What Fields to Embed

Only embed fields that are read in every query for this document AND change rarely. A customer's name changes very infrequently — safe to embed. A customer's loyalty points change daily — keep as pure reference only.

Field	Embed?	Reason
Customer name	✅ Yes	Shown on every order card; rarely changes
Customer email	✅ Yes	Used for order confirmation; rarely changes
Customer address	Snapshot only	Embed the address AT ORDER TIME — historical accuracy
Customer loyalty points	❌ No	Changes on every purchase — always look up live value
Customer phone	❌ Usually no	Not shown on order cards; wasted space

Handling Updates to Embedded Data

JavaScript (Node.js) — propagating name change

// When a customer changes their name, update recent open orders
await db.collection('orders').updateMany(
  {
    "customer._id": "u-123",
    status: { $in: ["pending", "processing"] }  // only open orders
  },
  { $set: { "customer.name": "Alice Johnson" } }
)
// Completed orders intentionally keep the old name — historical record

How to answer in an interview

"The Extended Reference Pattern embeds a snapshot of the most-read fields from a referenced document alongside the foreign key. This eliminates $lookup on the hot read path — an order list page can render customer names without any join. The trade-off is controlled duplication: when the customer's name changes, you run an updateMany on open orders. The key discipline is to only embed fields that appear in every query on this document AND change rarely. Frequently-changing fields like loyalty points stay as pure references."

What is the Outlier Pattern? What problem does it solve?

Hard Outlier Pattern Large Arrays

▾

The Problem

Most books have ~10 reviews. But a bestseller might have 100,000 reviews. If you embed reviews in the book document, 99.9% of books are fine — but the 0.1% of bestsellers hit the 16 MB document limit and corrupt your schema for everyone. Designing for the outlier ruins the normal case.

The Outlier Pattern Solution

MongoDB — outlier pattern
// Standard book document — reviews embedded (normal case)
{
  _id: "book-1",
  title: "Learning MongoDB",
  hasOverflow: false,             // sentinel flag
  reviews: [
    { userId: "u-1", rating: 5, text: "Great book!" }
    // ... up to ~100 reviews embedded
  ]
}

// Outlier book — too many reviews to embed
{
  _id: "book-99",
  title: "Harry Potter",
  hasOverflow: true,              // sentinel: check overflow collection
  reviews: [
    // first ~100 reviews still embedded (for fast initial display)
  ]
}

// book_reviews_overflow collection — only exists for outliers
{ _id: ObjectId("..."), bookId: "book-99", reviews: [/* batch of reviews */] }
        

Application Logic

JavaScript (Node.js) — outlier-aware read

async function getReviews(bookId, page = 1) {
  const book = await db.collection('books').findOne({ _id: bookId });

  if (!book.hasOverflow) {
    return book.reviews;  // fast path — embedded data, no extra query
  }

  // Slow path — fetch from overflow collection with pagination
  return db.collection('book_reviews_overflow')
    .find({ bookId })
    .skip((page - 1) * 20)
    .limit(20)
    .toArray();
}

Why Not Just Reference Everything?

If you design the schema assuming all books are bestsellers (full reference pattern), every book page load requires an extra round-trip to the reviews collection — even for books with 3 reviews. The Outlier Pattern gives 99.9% of documents the fast embedded path and handles the 0.1% of outliers gracefully with a fallback.

How to answer in an interview

"The Outlier Pattern handles data skew — when 99% of documents fit a schema but 1% don't. The solution is to design for the common case (embed reviews for most books) and add a sentinel flag like hasOverflow: true on documents that exceed your threshold. Application code checks the flag and takes a different code path for outliers. The fast path (no flag) is a simple embedded read; the slow path queries an overflow collection. This keeps the schema optimal for the majority while correctly handling extreme cases."

Topic C — Schema Validation & Anti-Patterns

How do you enforce schema validation in MongoDB? What operators does it support?

Medium Schema Validation JSON Schema

▾

MongoDB Schema Validation

MongoDB supports JSON Schema validation via the $jsonSchema operator in the collection's validator. You define required fields, field types, value constraints, and patterns. The validator runs on insert and update operations; violations return a WriteError. Validation has two action modes: error (reject) and warn (allow but log).

Creating a Validated Collection

MongoDB Shell — schema validation

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "email", "createdAt"],
      properties: {
        name: {
          bsonType: "string",
          minLength: 2,
          maxLength: 100,
          description: "Name is required and must be 2–100 chars"
        },
        email: {
          bsonType: "string",
          pattern: "^[^@]+@[^@]+\\.[^@]+$"
        },
        age: {
          bsonType: "int",
          minimum: 0,
          maximum: 130
        },
        role: {
          bsonType: "string",
          enum: ["admin", "editor", "viewer"]
        },
        address: {
          bsonType: "object",
          required: ["city", "zip"],
          properties: {
            city: { bsonType: "string" },
            zip:  { bsonType: "string", pattern: "^[0-9]{5}$" }
          }
        }
      }
    }
  },
  validationLevel:  "strict",  // "strict" (all writes) or "moderate" (existing docs exempt)
  validationAction: "error"   // "error" (reject) or "warn" (log, allow)
})

Adding Validation to an Existing Collection

MongoDB Shell — add validator to existing collection
db.runCommand({
  collMod: "users",
  validator: { $jsonSchema: { /* ... */ } },
  validationLevel:  "moderate",  // existing docs exempt until next write
  validationAction: "warn"       // start with warn to catch issues without blocking
})
        

Mongoose vs Native Validation

Mongoose schema validation runs in the Node.js driver layer — a document that bypasses Mongoose (e.g., direct mongosh insert) skips it. Native $jsonSchema validation runs in the database engine and catches every write regardless of how it's submitted. For data integrity guarantees, use both.

How to answer in an interview

"MongoDB supports JSON Schema validation via $jsonSchema in the collection validator. You can enforce required fields, BSON types, string patterns, numeric ranges, and enum values. Validation runs at the database level — unlike Mongoose validation which only runs in the Node.js layer. I always start new collections with validationAction: 'warn' to discover what existing data violates the schema before switching to 'error' to reject invalid writes. validationLevel: 'moderate' lets existing documents stay, only validating on future writes."

What are the most common MongoDB schema anti-patterns and how do you fix them?

Hard Anti-Patterns Schema Design

▾

The 6 Most Common Anti-Patterns

Anti-Pattern	Symptom	Fix
Massive arrays (unbounded growth)	Arrays with thousands of elements; approaching 16 MB limit; slow updates	Child reference pattern or Bucket Pattern — store array elements in child docs
Massive number of collections	Hundreds of collections for similar data; collection scan on metadata	Consolidate with a discriminator type field; use one collection per entity type
Bloated documents	Large documents (hundreds of KB) rarely fully read; slow queries	Separate hot fields from cold fields — Subset Pattern
Unnecessary indexes	10+ indexes; slow writes; indexes don't fit in RAM	$indexStats to find zero-use indexes; drop them
Case-insensitive query without index	{ $regex: /alice/i } on large collections = COLLSCAN	Store normalised lowercase field; or use a collation index
Separating data that is read together	Multiple $lookup or application-side joins on every request	Embed the data — design for your read pattern

The Subset Pattern — Splitting Hot from Cold Data

MongoDB — subset pattern
// Product page always shows: name, price, 5 recent reviews, thumbnail
// Full review history rarely accessed

// products collection — hot data, small, fast to load
{
  _id: "p-1",
  name: "Laptop Pro",
  price: 1299,
  thumbnail: "https://cdn.example.com/laptop.jpg",
  recentReviews: [   // only last 5 — always shown
    { userId: "u-1", rating: 5, text: "Excellent!" }
  ]
}

// product_reviews collection — cold data, full history
{ _id: ObjectId("..."), productId: "p-1", rating: 4, text: "...", createdAt: new Date() }
// Only fetched when user clicks "See all 1,243 reviews"
        

Fixing Unbounded Arrays — The Symptom Checklist

MongoDB Shell — detect document size issues
// Find the largest documents in a collection
db.products.aggregate([
  { $project: { docSize: { $bsonSize: "$$ROOT" } } },
  { $sort: { docSize: -1 } },
  { $limit: 10 }
])
// docSize approaching 16,000,000 bytes = danger zone
// Documents > 1 MB are candidates for Subset or Bucket Pattern
        

How to answer in an interview

"The three most impactful anti-patterns are: unbounded arrays (a document that grows without limit will hit the 16 MB cap — use child references or the Bucket Pattern), bloated documents (reading a 500 KB document to display 5 fields is wasteful — use the Subset Pattern to split hot and cold data), and too many indexes (indexes have write overhead and must fit in RAM — use $indexStats to find and drop zero-use indexes). MongoDB's Performance Advisor in Atlas automatically flags these."

How do you model a tree structure (categories, org charts) in MongoDB?

Medium Tree Structures Hierarchical Data

▾

Four Tree Modeling Patterns

Pattern	Storage	Find Ancestors	Find Subtree
Parent Reference	{ parentId: "p-1" } on each node	Multi-hop queries (slow)	Must recurse application-side
Child References	{ children: ["c-1","c-2"] } on parent	Multi-hop (slow)	Traverse array
Array of Ancestors	{ ancestors: ["root","cat-1","cat-2"] } on each node	✅ Single doc read	{ ancestors: nodeId } — fast
Materialised Path	{ path: ",root,cat-1,cat-2," } on each node	✅ Regex or $regex	✅ Regex prefix match + text index

Array of Ancestors — Best for Most Use Cases

MongoDB — array of ancestors pattern
// Category tree: Electronics → Computers → Laptops
{ _id: "laptops",   name: "Laptops",    ancestors: ["electronics", "computers"], parentId: "computers" }
{ _id: "computers", name: "Computers",  ancestors: ["electronics"],             parentId: "electronics" }
{ _id: "electronics",name: "Electronics",ancestors: [],                           parentId: null }

// Find all ancestors of "laptops" — single document read
db.categories.find({ _id: "laptops" })  // returns ancestors array

// Find all descendants of "electronics" (entire subtree)
db.categories.find({ ancestors: "electronics" })
// Index on ancestors (multikey) makes this fast

// Show breadcrumb: Electronics > Computers > Laptops
db.categories.find({ _id: { $in: ["electronics", "computers"] } })
  .sort({ /* by depth */ })
        

Materialised Path — Best for Pattern Matching

MongoDB — materialised path pattern
// Store full path as a string
{ _id: "laptops",    path: ",electronics,computers,laptops," }
{ _id: "computers",  path: ",electronics,computers," }

// Find all descendants of "computers" — prefix match
db.categories.find({ path: /^,electronics,computers,/ })
// ✅ Fast with a text or regex index on path
        

How to answer in an interview

"The most practical pattern for category trees and org charts is Array of Ancestors — each node stores the full list of ancestor IDs as an array. This gives you subtree queries with a single indexed find({ ancestors: 'nodeId' }) and breadcrumb rendering with a single $in lookup. The trade-off is that moving a node requires updating ancestors on the node and all descendants. For very deep trees with many moves, Materialised Path is more flexible — store the full path string and use a regex prefix match. Both patterns are far better than Parent Reference for anything beyond two levels."

Walk me through designing a schema for a social media feed — posts, likes, comments, follows.

Hard System Design Real-World

▾

Access Pattern Analysis First

Before designing any schema, identify your read patterns. For a social feed:

Operation	Frequency	Latency Requirement
Render home feed (latest posts from follows)	Very high — every page load	< 100ms
Show post with like count + top comments	Very high	< 50ms
Like / unlike a post	High	< 200ms
Follow / unfollow a user	Medium	< 500ms
Get follower/following counts	High (shown on profile)	< 50ms

Collection Schemas

MongoDB — social feed schema design
// ── users collection ──
{
  _id: "u-1",
  username: "alice",
  displayName: "Alice Smith",
  avatar: "https://cdn.example.com/alice.jpg",
  followerCount: 1420,    // denormalised counter — fast profile render
  followingCount: 382
}

// ── follows collection — child-reference pattern (one-to-squillions) ──
{ followerId: "u-1", followeeId: "u-2", createdAt: new Date() }
// Index: { followerId: 1, followeeId: 1 } unique
// Index: { followeeId: 1 } — "who follows user X"

// ── posts collection ──
{
  _id: ObjectId("..."),
  authorId: "u-1",
  author: {                      // Extended Reference — avoids join on feed render
    username: "alice",
    displayName: "Alice Smith",
    avatar: "https://cdn.example.com/alice.jpg"
  },
  content: "Just learned about MongoDB schema design!",
  mediaUrls: ["https://cdn.example.com/img1.jpg"],
  likeCount: 47,               // denormalised counter
  commentCount: 8,             // denormalised counter
  createdAt: new Date()
}
// Index: { authorId: 1, createdAt: -1 } — user's post history
// Index: { createdAt: -1 } — global feed sort

// ── likes collection — avoids unbounded array in post ──
{ postId: ObjectId("..."), userId: "u-1", createdAt: new Date() }
// Index: { postId: 1, userId: 1 } unique — fast "did I like this?" check

// ── comments collection ──
{
  _id: ObjectId("..."),
  postId: ObjectId("..."),
  authorId: "u-1",
  author: { username: "alice", avatar: "..." }, // Extended Reference
  text: "Great post!",
  createdAt: new Date()
}
// Index: { postId: 1, createdAt: 1 } — comments for a post, oldest first
        

Feed Generation

JavaScript (Node.js) — home feed query
// Step 1: get list of users the current user follows
const following = await db.collection('follows')
  .find({ followerId: currentUserId })
  .map(f => f.followeeId)
  .toArray();

// Step 2: get recent posts from those users (fan-in read model)
const feed = await db.collection('posts')
  .find({ authorId: { $in: following } })
  .sort({ createdAt: -1 })
  .limit(20)
  .toArray();
// author data already embedded — no $lookup needed on render
        

Key Design Decisions Explained

Decision	Why
Separate likes collection (not array in post)	Unbounded — a viral post could have millions of likes; also enables "did I like this?" query with a unique index
Denormalised likeCount / commentCount on post	Feed renders fast — no aggregation query needed. Use $inc to keep counters consistent atomically
Extended Reference (author snapshot) in posts/comments	Feed and comment list render without $lookup; username/avatar rarely changes
Separate follows collection	Users can have millions of followers — can't embed in user doc; enables efficient "get my feed" and "get followers" queries

How to answer in an interview

"I'd start with access patterns, not entities. The hot path is feed rendering — it must be fast. I'd use separate collections for posts, likes, comments, and follows (all unbounded). Posts embed an Extended Reference for author info so the feed renders without joins. Like count and comment count are denormalised counters on the post (updated with $inc) so the feed never needs aggregation. The follows collection enables feed queries via $in on authorId. This fan-in read model works up to ~few thousand follows; beyond that you'd move to a pre-computed feed (fan-out on write)."

🔒

Segment 6 — Transactions & Consistency

MongoDB's ACID guarantees, read/write concerns, and multi-document transactions are heavily tested at senior level. Understanding when to use transactions vs atomic operators, what write concern means for durability, and how change streams enable real-time pipelines is essential.

ACID Guarantees Read / Write Concern Multi-Doc Transactions Atomic Operators Change Streams Idempotency

Questions

Opened

Topics Covered

~50m

Study Time

🧠 Mental Model — Single Document vs Multi-Document Atomicity

MongoDB has always provided atomic operations on a single document — this covers the majority of use cases because a well-designed document embeds related data together. Multi-document transactions (added in 4.0) exist for cases where atomicity across multiple collections is truly unavoidable — but they come with a performance cost.

🏦 The Bank Transfer Analogy

Debiting Alice's account and crediting Bob's are two separate documents. Without a transaction, a crash between the two writes leaves Bob uncredited. With a transaction, both writes succeed or both roll back — the database is never in a half-updated state. But if you'd modelled both balances in one document (e.g., an internal ledger), a single atomic update would suffice without any transaction overhead.

Design to Avoid Transactions

The MongoDB design philosophy is: if you need a transaction, reconsider your schema first. Embedding related data in a single document makes single-document atomicity sufficient for most operations. Use multi-document transactions only when schema changes are genuinely impractical.

Topic A — ACID & Atomicity

Does MongoDB support ACID transactions? What are the guarantees at single-document level?

Easy ACID Atomicity

▾

ACID in MongoDB

Property	Single Document	Multi-Document Transaction (4.0+)
Atomicity	✅ Always — all field updates in one write succeed or fail together	✅ All writes across documents/collections commit or all roll back
Consistency	✅ Validators and unique indexes enforced per write	✅ Validators run; unique constraints enforced across the transaction
Isolation	✅ Snapshot isolation via WiredTiger MVCC	✅ Snapshot isolation — reads see a consistent point-in-time snapshot
Durability	Depends on write concern — w:1 = durable after primary ack; w:majority = durable on majority	Same write concern rules apply to the commit operation

Single-Document Atomicity — The Core Guarantee

Every MongoDB write operation (insert, update, delete, findAndModify) on a single document is atomic. Even if you update 20 fields simultaneously, observers either see all 20 changes or none — never a half-updated document. This is implemented by WiredTiger's MVCC (Multi-Version Concurrency Control) — readers see the committed version while a write is in progress.

MongoDB Shell — atomic multi-field update

// All three field changes are atomic — no partial state visible
db.orders.updateOne(
  { _id: orderId },
  {
    $set: { status: "shipped", shippedAt: new Date() },
    $inc: { version: 1 }
  }
)
// A concurrent reader sees either the old state OR the new state
// — never status="shipped" with the old shippedAt

When Single-Document Atomicity Is Enough

Because MongoDB documents can embed related data, many operations that would require a transaction in a relational DB are single-document writes in MongoDB:

Scenario	SQL Approach	MongoDB Single-Doc Approach
Add order + line items	INSERT into orders + INSERT into order_items (transaction)	Single insertOne with items embedded
Update product + inventory	UPDATE products + UPDATE inventory (transaction)	Embed inventory in product doc; single updateOne
Publish post + update count	INSERT post + UPDATE user.postCount (transaction)	$inc postCount in same updateOne that sets published:true

How to answer in an interview

"Yes, MongoDB is fully ACID. At the single-document level it has always been atomic — all field changes in one write are committed together or not at all, implemented via WiredTiger's MVCC. Since MongoDB 4.0, multi-document ACID transactions are supported across collections and databases, with the same snapshot isolation semantics. The MongoDB philosophy is to design schemas that make single-document atomicity sufficient — embedded documents eliminate the need for transactions in most cases. Transactions are available when truly needed but carry a performance cost."

What are write concerns in MongoDB? How does w:majority differ from w:1?

Medium Write Concern Durability

▾

What Write Concern Controls

Write concern specifies how many replica set members must acknowledge a write before MongoDB reports it as successful. It is the dial between performance (fewer acknowledgements = faster) and durability (more acknowledgements = safer against data loss on failover).

Write Concern Options

Write Concern	Meaning	Risk	Use Case
w: 0 (fire & forget)	No acknowledgement — returns immediately	Data loss if mongod crashes before writing	High-throughput logging where loss is acceptable
w: 1 (default)	Primary acknowledges after writing to its memory journal	Data loss if primary fails before replication completes	General purpose — balanced performance/safety
w: majority	Majority of replica set members acknowledge	Minimal — survives primary failover	Financial data, user records, anything that must not be lost
w: N (number)	N specific members acknowledge	Depends on N	Custom replication requirements

Write Concern + Journal (j option)

JavaScript (Node.js) — write concern options

// w:majority + j:true — strongest durability guarantee
// Write survives majority failover AND primary crash before journal flush
await db.collection('payments').insertOne(
  { amount: 500, currency: "USD" },
  { writeConcern: { w: "majority", j: true, wtimeout: 5000 } }
)

// High-throughput event logging — speed over safety
await db.collection('events').insertOne(
  { type: "page_view", url: "/home" },
  { writeConcern: { w: 0 } }
)

// Set default write concern at client level (recommended)
const client = new MongoClient(uri, {
  writeConcern: { w: "majority", j: true }
})

w:1 vs w:majority — The Rollback Scenario

With w:1: the primary acknowledges the write, but the secondaries haven't replicated it yet. If the primary crashes and a secondary is elected, the un-replicated write is rolled back — it never happened as far as the new primary knows. With w:majority: the write is on a majority of nodes before acknowledgement — it will survive any single-node failure, including the current primary failing.

How to answer in an interview

"Write concern is MongoDB's durability dial. w:1 (default) means the primary acknowledges after writing to its own journal — fast, but if the primary crashes before replication completes, that write can be rolled back on failover. w:majority waits until a majority of replica set members have the write — it survives any single-node failure. For financial data or anything that must not be lost, use w:majority with j:true (journal flushed to disk). For high-throughput event logging where occasional loss is acceptable, w:1 or even w:0 is fine."

What are read concerns? What is the difference between local, majority, and linearizable?

Medium Read Concern Consistency

▾

Read Concern Overview

Read concern controls how current and consistent the data returned by a read must be. It answers: "can I read data that might be rolled back, and do I need to wait for majority confirmation before reading?"

Read Concern Levels

Level	Returns	Rollback Risk	Latency
local (default)	Most recent data on the queried node — may not be majority-committed	⚠️ Yes — data may be rolled back on failover	Lowest
available	Same as local but for sharded clusters (may return orphaned chunks)	⚠️ Yes	Lowest
majority	Only data that has been acknowledged by a majority of nodes	✅ No rollback possible	Medium
linearizable	Most recent majority-committed data — guarantees real-time ordering	✅ No rollback + causal consistency	Highest — waits for majority ack
snapshot	Consistent snapshot at transaction start time (transactions only)	✅ Consistent across all reads in the transaction	Medium

Using Read Concern in Practice

JavaScript (Node.js) — read concern options

// Default — fast, may read un-committed data
db.collection('products').findOne({ _id: id })

// Majority — only read data that won't be rolled back
db.collection('payments').findOne(
  { _id: id },
  { readConcern: { level: "majority" } }
)

// Linearizable — strongest; single-document reads only
// Waits for majority acknowledgment before returning
db.collection('inventory').findOne(
  { _id: productId },
  { readConcern: { level: "linearizable" }, maxTimeMS: 5000 }
)
// Use when: "is there still stock?" before a purchase commitment

Linearizable Has Constraints

linearizable only works on primary reads of single documents. It cannot be used with aggregations, range queries, or reads from secondaries. Always set maxTimeMS to avoid blocking indefinitely if the primary loses contact with the majority.

How to answer in an interview

"Read concern controls what data you're allowed to see. The default 'local' returns whatever is on the queried node, including writes that haven't been majority-replicated yet and could be rolled back on failover. 'majority' only returns data confirmed by a majority of nodes — safe from rollback. 'linearizable' is the strongest guarantee: it ensures you read the most recent majority-committed value and that any write you read cannot be rolled back — but it only works for single-document primary reads and adds latency waiting for majority confirmation. For financial reads — 'did this payment succeed?' — use majority or linearizable."

How do multi-document transactions work in MongoDB? Show the full pattern with error handling.

Hard Transactions Session API

▾

Transaction Requirements

Requirement	Detail
Replica set required	Transactions require a replica set or sharded cluster — standalone mongod does not support them
Session required	Every transaction runs inside a ClientSession — pass the session to every operation
60-second limit	Transactions must commit within 60 seconds (configurable) — long transactions are killed
No DDL inside	Cannot create collections or indexes inside a transaction

Full Transaction Pattern with Retry Logic

JavaScript (Node.js) — transaction with retry

async function transferFunds(fromId, toId, amount) {
  const session = client.startSession();

  try {
    await session.withTransaction(async () => {
      const accounts = db.collection('accounts');

      // Step 1: Debit source account — check balance atomically
      const from = await accounts.findOneAndUpdate(
        { _id: fromId, balance: { $gte: amount } },  // only if sufficient funds
        { $inc: { balance: -amount } },
        { session, returnDocument: "after" }
      );
      if (!from) throw new Error("Insufficient funds");

      // Step 2: Credit destination account
      await accounts.updateOne(
        { _id: toId },
        { $inc: { balance: amount } },
        { session }
      );

      // Step 3: Write audit record
      await db.collection('transfers').insertOne(
        { fromId, toId, amount, ts: new Date() },
        { session }
      );
    }, {
      readConcern:  { level: "snapshot"  },
      writeConcern: { w: "majority" },
      maxCommitTimeMS: 5000
    });
    // withTransaction automatically retries transient errors
    // (TransientTransactionError) and commit errors (UnknownTransactionCommitResult)

  } finally {
    await session.endSession();
  }
}

withTransaction vs Manual startTransaction

Approach	Pros	Cons
session.withTransaction(fn)	Automatically retries on TransientTransactionError and UnknownTransactionCommitResult; handles commit/abort	Less control over retry logic
Manual startTransaction / commitTransaction / abortTransaction	Full control over retry behaviour and error handling	Must implement retry logic yourself — easy to get wrong

Performance Cost of Transactions

Transactions hold locks and generate extra replication overhead. On a busy replica set, transactions that touch many documents can cause write conflicts and transient errors. Keep transactions short (milliseconds, not seconds), touch as few documents as possible, and avoid user-facing input between transaction steps.

How to answer in an interview

"Multi-document transactions require a ClientSession — every operation inside the transaction must receive the session object as an option. I always use session.withTransaction() rather than manual startTransaction/commit/abort because it automatically retries on transient network errors and unknown commit results, which are both expected in distributed systems. Key constraints: 60-second max duration, no DDL inside, requires a replica set. For the actual transaction I use readConcern snapshot and writeConcern majority for full ACID guarantees."

Topic B — Atomic Operators & Patterns

What atomic update operators does MongoDB provide? When do they replace the need for a transaction?

Medium Atomic Operators $inc / $set / $push

▾

Atomic Update Operators Reference

Operator	Action	Atomic?
$set	Set a field to a value	✅ Yes
$unset	Remove a field	✅ Yes
$inc	Increment/decrement a numeric field	✅ Yes — no read-modify-write needed
$mul	Multiply a numeric field	✅ Yes
$min / $max	Update field only if new value is less/greater than current	✅ Yes
$push	Append element to array	✅ Yes
$addToSet	Append element to array only if not already present	✅ Yes
$pull / $pop	Remove element(s) from array	✅ Yes
$currentDate	Set field to current date/timestamp	✅ Yes
$setOnInsert	Only set fields when upsert creates a new document	✅ Yes

$inc Replaces Read-Modify-Write

JavaScript (Node.js) — atomic counter

// ❌ Race condition — two clients can both read 5, both write 6
const product = await findOne({ _id: id });
await updateOne({ _id: id }, { $set: { stock: product.stock - 1 } });

// ✅ Atomic — database decrements in one operation, no race
await db.collection('products').updateOne(
  { _id: id, stock: { $gt: 0 } },  // guard: only decrement if in stock
  { $inc: { stock: -1 } }
)
// If matchedCount === 0 → out of stock

Combining Multiple Operators — Still One Atomic Write

JavaScript (Node.js) — multi-operator atomic update
// Claim a queued job — find-and-mark atomically
const job = await db.collection('jobs').findOneAndUpdate(
  { status: "queued" },
  {
    $set:         { status: "running", workerId: workerId },
    $currentDate: { startedAt: true },
    $inc:         { attempts: 1 }
  },
  { sort: { priority: -1, createdAt: 1 }, returnDocument: "after" }
);
// All three changes are atomic — no two workers claim the same job
// If null → no queued jobs available
        

How to answer in an interview

"MongoDB's atomic operators — $inc, $push, $addToSet, $min/$max — allow you to make complex updates without a read-modify-write cycle, eliminating entire classes of race conditions. The classic example is a stock counter: instead of reading the stock, decrementing in application code, and writing back (which has a race window), use $inc with a filter guard ({ stock: { $gt: 0 } }) — the decrement and the check happen atomically. Combined with findOneAndUpdate, you can claim queue jobs, enforce state machines, and update multiple fields in one atomic round-trip without needing a multi-document transaction."

How do you implement optimistic concurrency control in MongoDB?

Hard Optimistic Locking Version Field

▾

What Optimistic Concurrency Control Is

Optimistic concurrency control (OCC) assumes conflicts are rare. Instead of locking, it allows multiple readers/writers to proceed concurrently, then detects conflicts at write time using a version field. If the version has changed since you read the document, your update is rejected — you re-read and retry. It avoids the performance overhead of pessimistic locks while preventing lost updates.

Version Field Pattern

JavaScript (Node.js) — optimistic locking with version field

// Document has a __v (version) field
// { _id: "p-1", stock: 10, price: 99, __v: 5 }

async function updateProductPrice(productId, newPrice) {
  while (true) {
    // Step 1: Read current document including version
    const product = await db.collection('products').findOne({ _id: productId });
    if (!product) throw new Error("Not found");

    // Step 2: Business logic with the current state
    const newDoc = { ...product, price: newPrice };

    // Step 3: Write — only succeeds if __v hasn't changed
    const result = await db.collection('products').updateOne(
      { _id: productId, __v: product.__v },  // version guard
      { $set: { price: newPrice }, $inc: { __v: 1 } }  // increment version
    );

    if (result.matchedCount === 1) return;  // success

    // matchedCount === 0 → someone else updated — retry
    // In practice: add retry limit + exponential backoff
  }
}

OCC vs Pessimistic Locking vs Transactions

Approach	Mechanism	Best For	Downside
Atomic operators ($inc)	Single-field atomic update — no version field needed	Counters, append-only arrays	Only works when operation is naturally atomic
Optimistic locking (__v)	Read + version-checked write; retry on conflict	Low-contention updates to complex fields	Retries under high contention; more application code
findOneAndUpdate	Atomic read + write in one operation	State machines, job queues, claiming resources	Entire update must fit in one operation
Multi-doc transaction	ACID across multiple documents/collections	Financial transfers, order + inventory atomicity	Performance overhead, 60s limit, replica set required

Mongoose Handles This Automatically

Mongoose's optimisticConcurrency: true option and the versionKey (__v) field implement this pattern automatically. Every save() adds a version guard and increments the version — a concurrent save throws a VersionError.

How to answer in an interview

"Optimistic concurrency control uses a version field (__v) on each document. Before writing, you include the current version in your filter — if the document has been modified by someone else, the filter won't match and matchedCount is 0, signalling a conflict that you retry. This avoids locks, scales well for low-contention scenarios, and is what Mongoose's optimisticConcurrency option implements. For high-contention scenarios (like flash sales), you'd use $inc with a filter guard instead — it's a single atomic operation with no retry loop."

What are Change Streams? How do you use them and what are their use cases?

Hard Change Streams Real-Time

▾

What Change Streams Are

Change streams are a real-time notification mechanism built on the oplog (replication log). They allow applications to subscribe to data changes on a collection, database, or entire cluster and receive a stream of change events without polling. Change streams guarantee at-least-once delivery and support resumability via a resume token.

Basic Change Stream Usage

JavaScript (Node.js) — change stream

const changeStream = db.collection('orders').watch(
  // Optional pipeline to filter events
  [{ $match: { operationType: { $in: ["insert", "update"] } } }],
  { fullDocument: "updateLookup" }  // include full doc on updates
);

for await (const change of changeStream) {
  console.log(change.operationType);   // "insert" | "update" | "delete" | "replace"
  console.log(change.fullDocument);   // the new document state
  console.log(change.updateDescription);// { updatedFields, removedFields }
  console.log(change._id);             // resume token — store this
}

// Resumable stream — restart from a saved token after disconnect
const resumableStream = db.collection('orders').watch([], {
  resumeAfter: savedResumeToken  // pick up exactly where you left off
});

Change Stream Use Cases

Use Case	How Change Streams Help
Real-time notifications	Watch orders collection; push WebSocket event when order status changes to "shipped"
Cache invalidation	Watch products; evict Redis cache entry when product price is updated
Cross-service sync	Watch users collection; replicate changes to Elasticsearch or a read replica
Audit logging	Watch sensitive collections; write every change to an immutable audit log
Event-driven microservices	Outbox pattern — watch an outbox collection and publish events to Kafka/RabbitMQ

The Outbox Pattern with Change Streams

JavaScript (Node.js) — outbox pattern

// Application writes order + outbox event in one transaction
await session.withTransaction(async () => {
  await db.collection('orders').insertOne({ ...orderData }, { session });
  await db.collection('outbox').insertOne(
    { event: "ORDER_PLACED", payload: orderData, published: false },
    { session }
  );
});

// Separate publisher process watches outbox and publishes to Kafka
const stream = db.collection('outbox').watch([
  { $match: { operationType: "insert" } }
]);
for await (const evt of stream) {
  await kafka.publish(evt.fullDocument.event, evt.fullDocument.payload);
  await db.collection('outbox').updateOne({ _id: evt.fullDocument._id }, { $set: { published: true } });
}

How to answer in an interview

"Change streams let you subscribe to real-time data changes on a collection, database, or cluster without polling. They're built on the oplog and guarantee at-least-once delivery with resumability via a resume token — if your consumer crashes, it restarts from the saved token and processes no missed events. The most important production pattern is the transactional outbox: write your domain event to an outbox collection inside the same transaction as the business write, then a change stream publisher reliably forwards it to Kafka or RabbitMQ. This guarantees at-least-once event delivery without distributed transactions."

How do you implement idempotent operations in MongoDB? Why does it matter?

Hard Idempotency Upsert

▾

Why Idempotency Matters

In distributed systems, networks are unreliable. A write might succeed on the server but the acknowledgement is lost — the client retries and the operation runs twice. An idempotent operation produces the same result whether executed once or N times. Without idempotency, retries cause duplicate records, double charges, or double-decremented inventory.

Idempotency Techniques

JavaScript (Node.js) — idempotency patterns

// ── Technique 1: Natural idempotent key with upsert ──
// Payment identified by idempotencyKey — safe to retry
await db.collection('payments').updateOne(
  { idempotencyKey: req.headers['idempotency-key'] },
  { $setOnInsert: {
    amount: req.body.amount,
    customerId: req.body.customerId,
    createdAt: new Date()
  }},
  { upsert: true }
)
// $setOnInsert only runs if a new doc is created
// If idempotencyKey already exists → matched, nothing changes

// ── Technique 2: Conditional update — only update if state allows ──
// State machine guard prevents double-processing
const result = await db.collection('orders').updateOne(
  { _id: orderId, status: "pending" },   // guard: only from pending
  { $set: { status: "processing", processedAt: new Date() } }
)
// Second call: status is already "processing" → filter doesn't match
// matchedCount === 0 is NOT an error — it's already processed

// ── Technique 3: $addToSet for deduplication ──
// Tag can only be added once regardless of how many times called
await db.collection('products').updateOne(
  { _id: productId },
  { $addToSet: { tags: "featured" } }  // idempotent — duplicate safe
)

Idempotency Key Collection Pattern

JavaScript (Node.js) — idempotency key store
// Persist idempotency keys with TTL — expire after 24 hours
db.createCollection("idempotency_keys", {
  validator: { $jsonSchema: { required: ["key", "result"] } }
})
db.idempotency_keys.createIndex({ key: 1 }, { unique: true })
db.idempotency_keys.createIndex({ createdAt: 1 }, { expireAfterSeconds: 86400 })

async function processIdempotently(key, fn) {
  try {
    // Try to claim the key atomically
    await db.collection("idempotency_keys").insertOne({ key, createdAt: new Date() });
    const result = await fn();
    await db.collection("idempotency_keys").updateOne({ key }, { $set: { result } });
    return result;
  } catch (e) {
    if (e.code === 11000) {   // duplicate key — already processed
      const existing = await db.collection("idempotency_keys").findOne({ key });
      return existing.result;  // return cached result
    }
    throw e;
  }
}
        

How to answer in an interview

"Idempotency prevents duplicate effects when operations are retried — critical in payment processing and distributed systems. MongoDB gives you three tools: upsert with $setOnInsert (safe to re-run; only inserts on first call), conditional state-machine updates (the filter guard prevents applying the same transition twice), and $addToSet (adding the same tag is naturally idempotent). For external API calls like payments, I store an idempotency key in a collection with a unique index and TTL — a duplicate key error (code 11000) means it already ran and I return the cached result."

What is causal consistency in MongoDB? How do sessions enable it?

Hard Causal Consistency Sessions

▾

The Problem — Stale Reads After Writes

Without causal consistency, this sequence can happen: 1) You write a document to the primary. 2) You immediately read from a secondary to reduce primary load. 3) The secondary hasn't replicated your write yet — you read stale data and your own write appears to have vanished. This violates the "read your own writes" guarantee.

How Causal Consistency Works

A causally consistent session tracks a cluster time and an operation time. When you read from a secondary, MongoDB ensures the secondary has applied at least up to the operation time of your last write before serving the read. This guarantees:

Guarantee	Meaning
Read your own writes	A read always sees the result of writes in the same session
Monotonic reads	Reads never go back in time — you never see older data after seeing newer data
Monotonic writes	Writes in a session are applied in order
Writes follow reads	A write reflects the state seen by the most recent read in the session

Enabling Causal Consistency

JavaScript (Node.js) — causally consistent session
// Start a causally consistent session
const session = client.startSession({ causalConsistency: true });

try {
  // Write to primary
  await db.collection('users').updateOne(
    { _id: userId },
    { $set: { email: "new@example.com" } },
    { session, writeConcern: { w: "majority" } }
  );

  // Read from secondary — causal consistency ensures we see our write
  const user = await db.collection('users').findOne(
    { _id: userId },
    { session, readConcern: { level: "majority" }, readPreference: "secondary" }
  );
  // user.email is guaranteed to be "new@example.com"
} finally {
  await session.endSession();
}
        

Requirement for Causal Consistency

Causal consistency requires readConcern: majority and writeConcern: majority. Using weaker concerns breaks the guarantees — the session tracks cluster time but the secondary is not obligated to wait before serving reads.

How to answer in an interview

"Causal consistency solves the 'read your own writes' problem when reading from secondaries. Without it, you write to the primary, immediately read from a secondary that hasn't replicated yet, and see stale data. A causally consistent session tracks operation time — when you read from a secondary, MongoDB waits until that secondary has applied at least up to your last write's timestamp before responding. It requires both readConcern and writeConcern set to majority, and the same session object passed to both operations."

Walk me through a complete payment processing flow using MongoDB transactions, write concern, and idempotency.

Hard Real-World Transactions

▾

Requirements

A payment API endpoint must: (1) Debit the customer's wallet, (2) Create a payment record, (3) Emit an ORDER_PAID event to the outbox. All three must be atomic — partial failure leaves the system in an inconsistent state. The endpoint must be idempotent — the client may retry on timeout.

Complete Implementation

JavaScript (Node.js) — production payment flow

async function processPayment({ orderId, customerId, amount, idempotencyKey }) {
  const session = client.startSession();

  try {
    let payment;

    await session.withTransaction(async () => {

      // ── Step 1: Idempotency check — claim the key inside the transaction ──
      const keyInsert = await db.collection('idempotency_keys').updateOne(
        { key: idempotencyKey },
        { $setOnInsert: { key: idempotencyKey, createdAt: new Date() } },
        { upsert: true, session }
      );

      if (keyInsert.upsertedCount === 0) {
        // Key already exists — fetch and return existing payment
        payment = await db.collection('payments').findOne(
          { idempotencyKey }, { session }
        );
        return;  // abort the rest — already processed
      }

      // ── Step 2: Debit wallet — atomic check-and-debit ──
      const wallet = await db.collection('wallets').findOneAndUpdate(
        { customerId, balance: { $gte: amount } },
        { $inc: { balance: -amount } },
        { session, returnDocument: "after" }
      );
      if (!wallet) throw Object.assign(new Error("Insufficient funds"), { code: "INSUFFICIENT_FUNDS" });

      // ── Step 3: Create payment record ──
      payment = {
        _id: new ObjectId(), orderId, customerId, amount,
        status: "completed", idempotencyKey, createdAt: new Date()
      };
      await db.collection('payments').insertOne(payment, { session });

      // ── Step 4: Write to outbox (change stream will publish to Kafka) ──
      await db.collection('outbox').insertOne({
        event: "ORDER_PAID", orderId, amount, paymentId: payment._id,
        createdAt: new Date()
      }, { session });

    }, { readConcern: { level: "snapshot" }, writeConcern: { w: "majority", j: true } });

    return payment;

  } finally {
    await session.endSession();
  }
}

Design Decisions Explained

Decision	Why
Idempotency check inside transaction	If key doesn't exist yet, claim it and process atomically — no window between check and insert
findOneAndUpdate for debit	Balance check and decrement are atomic — eliminates the TOCTOU race condition
Outbox inside same transaction	Event is emitted if and only if the payment commits — no lost events, no phantom events
writeConcern majority + j:true	Payment is on disk on a majority of nodes — survives primary crash, power outage
withTransaction (not manual)	Automatic retry on TransientTransactionError — handles transient network partitions

How to answer in an interview

"I'd use a multi-document transaction with four writes: claim the idempotency key (using $setOnInsert + upsert), debit the wallet with a balance guard (findOneAndUpdate with $gte filter), insert the payment record, and write to the outbox. All four are in the same transaction — either all commit or all roll back. writeConcern majority + j:true ensures the payment survives any single-node failure. The idempotency key inside the transaction means a client retry sees the existing payment rather than double-charging. A change stream publisher then reliably forwards the outbox event to downstream services."

🔁

Segment 7 — Replication & High Availability

Coming soon — replica sets, elections, failover, sharding, shard keys, mongos.

🛡️

Segment 8 — Security & Operations

Coming soon — RBAC, authentication, encryption, backups, profiler, Atlas, connection pooling.

🦾

Segment 9 — Mongoose & MERN Stack

Mongoose is the ODM layer every MERN application relies on. Interviewers expect you to know Schemas, Models, validation, middleware hooks, virtuals, populate, and connection management — not just the raw MongoDB driver. This segment covers everything that appears in real MERN interviews.

Schema & Models Validation Middleware / Hooks Virtuals populate() Connection Management

Questions

Opened

Topics Covered

~50m

Study Time

🧠 Mental Model — Raw Driver vs Mongoose

The native MongoDB driver gives you raw power — you talk directly to the database with no guardrails. Mongoose adds a structured layer on top: it enforces a schema, runs validation before every write, fires middleware hooks, and provides a cleaner API. Think of it as the difference between raw SQL and an ORM like Sequelize.

✅ What Mongoose adds

Schema definition with types and constraints
Built-in and custom validation before every save
Pre/post middleware hooks (before save, after find…)
Virtuals — computed properties not stored in DB
populate() — automatic reference resolution
Plugins — reusable schema extensions
TypeScript-friendly with generics support

⚠ Mongoose trade-offs

Overhead — validation + middleware add latency
Abstraction can hide what queries actually run
populate() is N+1 queries under the hood
Schema versioning (__v) adds noise to documents
Some advanced MongoDB features need raw driver access

Topic A — Schema & Model

What is Mongoose? What problem does it solve over the raw MongoDB Node.js driver?

Easy Mongoose ODM

▾

What is Mongoose?

Mongoose is an Object Document Mapper (ODM) for MongoDB and Node.js. It provides a schema-based solution to model your application data, adds automatic validation, type casting, query building, and middleware hooks — things the raw MongoDB driver intentionally leaves to you.

Raw Driver vs Mongoose — Side by Side

JavaScript

// ── Raw MongoDB Driver ──
const { MongoClient } = require('mongodb');
const db = client.db('myapp');

// No validation — any shape goes in
await db.collection('users').insertOne({
  name: 999,       // number instead of string — driver won't care
  email: "invalid"  // not an email — driver won't care
});

// ── Mongoose ──
const mongoose = require('mongoose');

const userSchema = new mongoose.Schema({
  name:  { type: String,  required: true },
  email: { type: String,  required: true, match: /^.+@.+\..+$/ }
});
const User = mongoose.model('User', userSchema);

await new User({ name: 999, email: "invalid" }).save();
// Throws ValidationError: name must be a string, email is invalid

When to Use Mongoose vs Raw Driver

Situation	Recommendation
MERN/Express application with structured data	Mongoose — validation + hooks save a lot of boilerplate
Microservice needing max performance, flexible docs	Raw driver — less overhead, more control
Complex aggregation pipelines	Mongoose supports them, but raw driver is cleaner for heavy pipeline work
Rapid prototyping with TypeScript	Mongoose — great TS support out of the box

How to answer in an interview

"Mongoose is an ODM — it sits between your Node.js code and MongoDB and adds a structured layer the raw driver doesn't have: schema definition, type casting, validation before every write, and middleware hooks. The raw driver lets you insert any shape of document; Mongoose enforces that documents conform to your schema. For MERN apps, Mongoose is almost always worth it for the structure and DX it provides, though it adds some overhead over the raw driver."

What is a Mongoose Schema? What types and options can you define in it?

Easy Schema

▾

What is a Schema?

A Mongoose Schema is a blueprint for a MongoDB collection's documents. It defines the fields, their data types, default values, validators, and other constraints. The Schema itself does not write to MongoDB — it is passed to mongoose.model() to create a Model.

Schema Definition — Full Example

JavaScript (Mongoose)

const { Schema, model } = require('mongoose');

const userSchema = new Schema(
  {
    name: {
      type: String,
      required: [true, 'Name is required'],  // custom error message
      trim: true,                               // auto-trim whitespace
      minlength: 2,
      maxlength: 100
    },
    email: {
      type: String,
      required: true,
      unique: true,                            // creates a unique index
      lowercase: true,                         // auto-lowercase before save
      match: [/^.+@.+\..+$/, 'Invalid email']
    },
    age: {
      type: Number,
      min: 0,
      max: 120
    },
    role: {
      type: String,
      enum: ['user', 'admin', 'moderator'],  // only these values allowed
      default: 'user'
    },
    isActive: { type: Boolean, default: true },
    tags: [String],                              // shorthand for array of strings
    address: {                                  // nested subdocument
      city:    String,
      country: { type: String, default: 'India' }
    },
    profileId: { type: Schema.Types.ObjectId, ref: 'Profile' } // reference
  },
  {
    timestamps: true,   // auto adds createdAt and updatedAt
    versionKey: false  // removes the __v field
  }
);

Schema Types Quick Reference

Stringtrim, lowercase, uppercase, match, enum, minlength, maxlength

Numbermin, max

Datemin, max, default: Date.now

Booleandefault: true/false

ObjectIdref: 'ModelName' for populate

MixedAny type — no validation

Array[String], [Number], [Schema]

MapKey-value pairs with typed values

Always Set timestamps: true

{ timestamps: true } in schema options automatically adds createdAt and updatedAt fields managed by Mongoose. This saves you from writing pre-save hooks for this very common requirement.

How to answer in an interview

"A Mongoose Schema defines the shape and rules for documents in a collection. Each field gets a type plus optional constraints like required, unique, min/max, enum, and default. The schema options object is equally important — timestamps: true auto-adds createdAt/updatedAt, and versionKey: false removes the noisy __v field. The Schema itself doesn't interact with MongoDB — you pass it to mongoose.model() to create a queryable Model."

What is a Mongoose Model? How does it differ from a Schema?

Easy Model

▾

Schema vs Model — The Key Distinction

	Schema	Model
What it is	Blueprint / class definition	The compiled, MongoDB-connected class
Talks to MongoDB?	No	Yes — all CRUD goes through the Model
Analogy	Architecture blueprint of a house	The actual house built from that blueprint
Created with	`new mongoose.Schema({...})`	`mongoose.model('Name', schema)`

Creating & Using a Model

JavaScript (Mongoose)

const userSchema = new Schema({ name: String, email: String });

// Model name 'User' → maps to collection 'users' (auto-pluralised, lowercased)
const User = mongoose.model('User', userSchema);

// ── CREATE ──
const user = new User({ name: 'Alice', email: 'alice@x.com' });
await user.save();

// Shorthand (create = new + save)
await User.create({ name: 'Bob', email: 'bob@x.com' });

// ── READ ──
const users    = await User.find({ isActive: true });
const oneUser  = await User.findById(id);
const byEmail  = await User.findOne({ email: 'alice@x.com' });

// ── UPDATE ──
await User.findByIdAndUpdate(id, { $set: { name: 'Alice B' } }, { new: true });

// ── DELETE ──
await User.findByIdAndDelete(id);

⚠ { new: true } on findByIdAndUpdate

By default, findByIdAndUpdate returns the document before the update. Pass { new: true } to get the updated document back — the same as returnDocument: 'after' in the raw driver.

How to answer in an interview

"A Schema is the blueprint — it defines fields, types, and validation rules but has no database connection. A Model is the compiled class created by passing a schema to mongoose.model() — it is what you use to create, query, update, and delete documents. The model name you pass (e.g., 'User') determines the collection name — Mongoose auto-pluralises it to 'users'."

Topic B — Validation & Middleware

How does Mongoose validation work? What are built-in validators vs custom validators?

Medium Validation

▾

When Does Validation Run?

Mongoose validation runs automatically before every save() and create(). It does not run automatically on updateOne, findByIdAndUpdate etc. unless you pass { runValidators: true } explicitly.

Built-in Validators

JavaScript (Mongoose Schema)

const productSchema = new Schema({
  name:     { type: String,  required: true, minlength: 3, maxlength: 200 },
  price:    { type: Number,  required: true, min: 0 },
  category: { type: String,  enum: ['electronics', 'books', 'clothing'] },
  sku:      { type: String,  unique: true, match: /^[A-Z]{3}-\d{4}$/ }
});

Custom Validators

JavaScript (Mongoose)

const userSchema = new Schema({
  phone: {
    type: String,
    validate: {
      validator: function(v) {
        return /^\+?[\d\s\-]{10,15}$/.test(v);
      },
      message: props => `${props.value} is not a valid phone number`
    }
  },
  password: {
    type: String,
    validate: {
      validator: async function(v) {      // async validator
        return v.length >= 8;
      },
      message: 'Password must be at least 8 characters'
    }
  }
});

// Trigger validators on update operations too
await User.findByIdAndUpdate(id, update, { runValidators: true, new: true });

⚠ Validation Doesn't Run on Update by Default

updateOne(), updateMany(), and findByIdAndUpdate() bypass Mongoose validation unless you pass { runValidators: true }. This is a common source of data quality bugs — always add this option if you rely on schema validators.

How to answer in an interview

"Mongoose has built-in validators like required, min, max, enum, and match that run automatically before save(). For custom logic, you pass a validate object with a validator function — it can be sync or async. The critical gotcha: validation does NOT run on updateOne or findByIdAndUpdate by default — you must pass { runValidators: true } to enable it."

What are Mongoose middleware (hooks)? Explain pre and post hooks with examples.

Medium Middleware Hooks

▾

What are Middleware Hooks?

Mongoose middleware (also called hooks) are functions that run before or after specific operations — save, validate, remove, find, update etc. They let you inject logic into the document lifecycle without cluttering your route handlers.

Pre Hooks — Run Before the Operation

JavaScript (Mongoose)

const bcrypt = require('bcryptjs');

// Hash password BEFORE saving — classic use case
userSchema.pre('save', async function(next) {
  if (!this.isModified('password')) return next(); // only hash if changed
  this.password = await bcrypt.hash(this.password, 12);
  next();
});

// Normalise email before saving
userSchema.pre('save', function(next) {
  this.email = this.email.toLowerCase().trim();
  next();
});

// Pre-find: automatically exclude deleted documents (soft delete)
userSchema.pre(/^find/, function(next) {  // regex matches find, findOne, findById…
  this.where({ isDeleted: { $ne: true } });
  next();
});

Post Hooks — Run After the Operation

JavaScript (Mongoose)

// Post-save: send welcome email after user created
userSchema.post('save', async function(doc) {
  if (doc.isNew) {
    await sendWelcomeEmail(doc.email);
  }
});

// Post-find: log slow queries
userSchema.post('find', function(result) {
  console.log(`Found ${result.length} users`);
});

// Post hook with error handling
userSchema.post('save', function(error, doc, next) {
  if (error.name === 'MongoServerError' && error.code === 11000) {
    next(new Error('Email already exists'));  // duplicate key → friendly error
  } else {
    next(error);
  }
});

Hooks Available

saveBefore/after save() and create()

validateBefore/after validation runs

removeBefore/after deleteOne on doc instance

find / findOneBefore/after query execution

updateOneBefore/after updateOne queries

aggregateBefore/after aggregation pipeline

How to answer in an interview

"Mongoose middleware are hooks that intercept operations in the document lifecycle. Pre hooks run before — the most common use case is hashing passwords in a pre-save hook so the plain text never reaches the database. Post hooks run after — useful for side effects like sending emails or logging. A powerful pattern for soft deletes is a pre-find hook that automatically adds { isDeleted: { $ne: true } } to every query — transparent to the rest of the app."

Topic C — Virtuals, Populate & Plugins

What are Mongoose virtuals? How and when do you use them?

Medium Virtuals

▾

What are Virtuals?

Virtuals are computed properties on a document that are not stored in MongoDB. They exist only in application memory, derived from other fields. Virtuals are perfect for values that can always be calculated from persisted data — no need to duplicate them in the database.

Defining Virtuals — Getter & Setter

JavaScript (Mongoose)

const userSchema = new Schema({
  firstName: String,
  lastName:  String,
  birthYear: Number
}, { toJSON: { virtuals: true }, toObject: { virtuals: true } });
// ↑ Must enable for virtuals to appear in JSON output

// Virtual getter — computed from other fields
userSchema.virtual('fullName').get(function() {
  return `${this.firstName} ${this.lastName}`;
});

// Virtual setter — split fullName back into parts
userSchema.virtual('fullName').set(function(v) {
  const [first, ...rest] = v.split(' ');
  this.firstName = first;
  this.lastName  = rest.join(' ');
});

// Age virtual — calculated from birthYear
userSchema.virtual('age').get(function() {
  return new Date().getFullYear() - this.birthYear;
});

// Usage
const user = await User.findById(id);
console.log(user.fullName); // "Alice Smith" — not stored in DB
console.log(user.age);      // 30 — computed each time

⚠ Virtuals Don't Appear in JSON by Default

You must set { toJSON: { virtuals: true }, toObject: { virtuals: true } } in your schema options, otherwise calling res.json(user) will not include virtual fields. You cannot query or sort by virtuals — they are computed in memory only.

How to answer in an interview

"Virtuals are computed document properties that never get stored in MongoDB. Common examples: fullName derived from firstName and lastName, or age from birthYear. They save storage and keep the DB as the source of truth for just the raw fields. The two things to remember: enable toJSON: { virtuals: true } in schema options or they won't appear in API responses, and you can't use them in MongoDB queries since they don't exist in the database."

How does populate() work in Mongoose? What are its performance implications?

Medium populate References

▾

What is populate()?

populate() automatically replaces a reference field (ObjectId) with the actual document from the referenced collection. It is Mongoose's answer to SQL JOINs — but under the hood it fires a separate query, not a database-level join.

Setup & Usage

JavaScript (Mongoose)

// Schemas with references
const postSchema = new Schema({
  title:   String,
  author:  { type: Schema.Types.ObjectId, ref: 'User' },   // single ref
  tags:    [{ type: Schema.Types.ObjectId, ref: 'Tag' }]   // array of refs
});

// Basic populate
const post = await Post
  .findById(id)
  .populate('author');             // replaces ObjectId with full User doc

// Populate with field selection (only return name and email)
const post = await Post
  .findById(id)
  .populate({ path: 'author', select: 'name email -_id' });

// Populate multiple fields
const post = await Post
  .findById(id)
  .populate('author')
  .populate('tags');

// Deep / nested populate (author's company)
const post = await Post
  .findById(id)
  .populate({ path: 'author', populate: { path: 'company' } });

Performance — The N+1 Problem

populate() fires additional queries — it is not a JOIN. For a list of 50 posts, populate('author') fires 1 query for posts + 1 query per unique author. In practice Mongoose batches the second query using $in, so it's 2 queries total — but for deeply nested or large data sets this adds up.

⚠ When to Avoid populate()

If you're populating inside a loop or doing deep multi-level populate on large datasets, switch to the Aggregation Pipeline with $lookup — it runs server-side in one query and is far more efficient at scale.

How to answer in an interview

"populate() replaces an ObjectId reference with the actual document from another collection. Under the hood it fires a second query using $in — it's not a database JOIN. For most MERN apps this is fine, but for high-volume endpoints or deep nested populates, the extra round-trip becomes a bottleneck. In those cases I switch to the aggregation pipeline with $lookup, which resolves the join server-side in a single query."

What are Mongoose plugins? How do you create and use one?

Medium Plugins

▾

What is a Plugin?

A Mongoose plugin is a reusable function that adds functionality to a Schema — fields, methods, statics, virtuals, or middleware. Instead of repeating the same schema additions across multiple models, you encapsulate them in a plugin and apply it globally or per-schema.

Creating a Plugin — Soft Delete Example

JavaScript (Mongoose Plugin)

// softDelete.plugin.js
function softDeletePlugin(schema) {
  // Add fields to every schema that uses this plugin
  schema.add({
    isDeleted:  { type: Boolean, default: false },
    deletedAt:  { type: Date,    default: null  }
  });

  // Add an instance method
  schema.methods.softDelete = async function() {
    this.isDeleted = true;
    this.deletedAt = new Date();
    return this.save();
  };

  // Auto-exclude deleted documents from all finds
  schema.pre(/^find/, function(next) {
    this.where({ isDeleted: false });
    next();
  });
}

module.exports = softDeletePlugin;

// ── Usage ──
const softDelete = require('./softDelete.plugin');

// Apply to a single schema
userSchema.plugin(softDelete);

// Apply GLOBALLY to all schemas in your app
mongoose.plugin(softDelete);

Popular Community Plugins

mongoose-paginate-v2Adds paginate() method with cursor/offset support

mongoose-unique-validatorBetter error messages for unique constraint violations

mongoose-autopopulateAuto-populates refs without calling .populate()

mongoose-lean-virtualsIncludes virtuals on lean() queries

How to answer in an interview

"A Mongoose plugin is a reusable function that receives a schema and augments it — adding fields, instance methods, static methods, or middleware. The classic use case is a soft-delete plugin: add isDeleted and deletedAt fields, a softDelete() instance method, and a pre-find hook that filters out deleted records automatically. Apply it with schema.plugin(fn) or globally with mongoose.plugin(fn) to add it to every schema in the application."

Topic D — Connection & Production Patterns

How do you manage Mongoose connections properly in a Node.js/Express application?

Medium Connection Best Practices

▾

Production-Ready Connection Pattern

JavaScript (Express + Mongoose)

// db.js — connection module
const mongoose = require('mongoose');

const connectDB = async () => {
  try {
    await mongoose.connect(process.env.MONGO_URI, {
      maxPoolSize:      10,   // max connections in pool (default 5)
      serverSelectionTimeoutMS: 5000,  // fail fast if no server found
      socketTimeoutMS:  45000 // drop idle sockets after 45s
    });
    console.log('MongoDB connected');
  } catch (err) {
    console.error('MongoDB connection error:', err);
    process.exit(1);  // crash the process — let PM2/K8s restart it
  }
};

// Handle connection events
mongoose.connection.on('disconnected', () => console.warn('MongoDB disconnected'));
mongoose.connection.on('error', (err) => console.error('MongoDB error:', err));

module.exports = connectDB;

// server.js — call BEFORE starting Express
await connectDB();
app.listen(process.env.PORT, () => console.log('Server running'));

Graceful Shutdown

JavaScript

// Close connection gracefully on SIGTERM (Docker stop, K8s pod eviction)
process.on('SIGTERM', async () => {
  await mongoose.connection.close();
  process.exit(0);
});

Connection Pool — One Connection for the Whole App

Mongoose maintains a connection pool internally. Call mongoose.connect() once at startup — Mongoose reuses pooled connections across all requests. Never open a new connection per request — that's a resource leak.

Next.js / Serverless — Reuse Across Hot Reloads

In Next.js, module cache clears on hot reload but the connection stays open. Cache the connection on global._mongoose to avoid "too many connections" errors during development.

How to answer in an interview

"Connect once at startup with mongoose.connect() and let Mongoose's internal connection pool handle concurrent requests — never open a new connection per request. Key options: maxPoolSize (default 5, tune based on load), serverSelectionTimeoutMS to fail fast if MongoDB isn't reachable. In production, listen to the disconnected event and handle SIGTERM for graceful shutdown so in-flight requests complete before the process exits."

What is lean() in Mongoose? When should you use it and what do you give up?

Medium Performance lean()

▾

What is lean()?

By default, Mongoose wraps every returned document in a Mongoose Document object — a rich class instance with methods, virtuals, change tracking, and middleware support. .lean() tells Mongoose to skip this wrapping and return plain JavaScript objects directly from the driver.

Performance Impact

JavaScript (Mongoose)

// Standard query — returns Mongoose Document instances
const users = await User.find({ isActive: true });
// Each user is a full Mongoose Document: ~3x more memory, slower
users[0].save();           // ✅ available
users[0].fullName;         // ✅ virtual works

// lean() query — returns plain JS objects
const users = await User.find({ isActive: true }).lean();
// Each user is a plain object: ~2-3x faster, much less memory
users[0].save();           // ❌ TypeError — not a Document
users[0].fullName;         // ❌ undefined — no virtuals

lean() — What You Gain vs Lose

	Standard Query	lean() Query
Speed	Slower (document hydration)	2–3× faster
Memory	Higher	Much lower
.save(), .remove()	✅ Available	❌ Not available
Virtuals	✅ Work	❌ Not included (use lean-virtuals plugin)
Middleware	✅ Fires on doc methods	❌ Not relevant
Change tracking	✅ isModified() works	❌ Not available
Best for	Documents you will mutate	Read-only data: API responses, reports

Rule of Thumb

Use .lean() for any query whose result goes straight to an API response (res.json()) and won't be modified. Use standard queries when you need to call .save() or access virtuals.

How to answer in an interview

"lean() makes Mongoose return plain JavaScript objects instead of full Mongoose Document instances. It's 2–3× faster and uses significantly less memory because Mongoose skips document hydration — no change tracking, no instance methods, no virtuals. Use it on any read-only query that goes straight to an API response. Avoid it when you need to call .save(), use virtuals, or rely on instance methods on the returned documents."