The Architect’s Notebook

The Architect’s Notebook

Ep #76: Tuning Postgres: The Art of Vacuum, Fillfactor, and Advanced Indexing

Moving beyond CREATE INDEX: How to optimize Postgres for high-velocity updates and massive datasets.

The Architect’s Notebook's avatar
The Architect’s Notebook
Jan 22, 2026
∙ Paid

Breaking the complex System Design Components

By Amit Raghuvanshi | The Architect’s Notebook
🗓️ Jan 22, 2026 · Deep Dive ·


The MVCC Trap & Fillfactor

The Postgres Quirk: MVCC Most developers treat Postgres like a black box: put data in, get data out. But Postgres has a unique architecture called MVCC (Multi-Version Concurrency Control). When you UPDATE a row in Postgres, it does not overwrite the old data. Instead, it:

  1. Marks the old row (tuple) as “dead.”

  2. Inserts a new version of the row.

  3. Updates all indexes to point to the new location.

This means every UPDATE is actually an INSERT. This leads to two massive problems: Table Bloat (dead rows taking up space) and Write Amplification (updating indexes unnecessarily).

If you have a table with 10 indexes, and you update the last_login column, Postgres has to update all 10 indexes to point to the new row location, even if the other indexed columns didn’t change. This is massive Write Amplification.

As part of this post we will fix your Postgres configuration to handle high-velocity updates without bloating your disk.

Here are 4 strategies to fix this.


Strategy 1: Tuning FILLFACTOR (The “Elbow Room” Strategy)

By default, Postgres packs data pages 100% full (FILLFACTOR = 100). This is great for read-only data, but terrible for updates.

The Scenario: You have a page that is 100% full. You update one row. Because the page is full, the new version of the row cannot fit on the same page. Postgres has to write the new row to a completely different page on the disk. Because the row moved pages, Every Single Index pointing to that row must be updated to point to the new page address.

The Fix: Set FILLFACTOR to 80 or 90 for write-heavy tables.

ALTER TABLE users SET (fillfactor = 90);

This tells Postgres: “Only fill pages up to 90%. Leave 10% empty.” Now, when you update a row, Postgres places the new version in that 10% empty space on the same page.

The Magic: HOT Updates (Heap Only Tuples) If the new row stays on the same page, Postgres creates a HOT Update. It does not need to update the indexes. It simply updates a pointer on the page header.

  • Without HOT: Update Heap + Update 5 Indexes.

  • With HOT: Update Heap only. Result: 80% reduction in I/O for update-heavy workloads.

When to Use Different FILLFACTOR Values

-- High-frequency updates (e.g., session tracking, real-time counters)
ALTER TABLE sessions SET (fillfactor = 70);

-- Moderate updates (e.g., user profiles)
ALTER TABLE users SET (fillfactor = 85);

-- Mostly reads, rare updates (e.g., product catalog)
ALTER TABLE products SET (fillfactor = 95);

-- Append-only (e.g., immutable logs)
ALTER TABLE audit_logs SET (fillfactor = 100);

Monitoring HOT Updates

Check if your updates are benefiting from HOT:

SELECT 
    schemaname,
    tablename,
    n_tup_upd,
    n_tup_hot_upd,
    ROUND(100.0 * n_tup_hot_upd / NULLIF(n_tup_upd, 0), 2) AS hot_update_ratio
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY hot_update_ratio ASC;

Goal: Hot update ratio > 80% for frequently updated tables.


Strategy 2: Partial Indexes (The “Sniper” Strategy)

Stop indexing everything. Use Partial Indexes to index only the rows you actually query.

Scenario: Orders Table You have an orders table with 10 million rows. 95% of orders are completed, and you never query them. You only query pending orders (the active 5%).

Bad Index:

CREATE INDEX idx_status ON orders(status);
-- Indexes ALL 10 million rows. Huge size. Slow writes.

Good Index:

CREATE INDEX idx_pending ON orders(status) WHERE status = ‘pending’;
-- Indexes only 50k rows. Tiny size. Fast writes.

The Partial Index is:

  1. Smaller: 50k rows vs 10M rows, Fits in RAM easily.

  2. Faster to read: Less data to scan

  3. Faster to Write: If you insert a completed order, the index is ignored. Zero write overhead. Only updated when status changes to/from ‘pending’

More Partial Index Examples

Active users only:

CREATE INDEX idx_active_users ON users(last_login) 
WHERE active = true;

Recent orders:

CREATE INDEX idx_recent_orders ON orders(created_at) 
WHERE created_at > NOW() - INTERVAL ‘30 days’;

High-value transactions:

CREATE INDEX idx_high_value ON transactions(amount) 
WHERE amount > 10000;

Performance Impact Example

-- Before: Full index
CREATE INDEX idx_all_orders_status ON orders(status);
-- Size: 450 MB
-- Write overhead: Every insert updates index

-- After: Partial index
CREATE INDEX idx_pending_orders ON orders(status) 
WHERE status IN (’pending’, ‘processing’);
-- Size: 12 MB
-- Write overhead: Only 5% of inserts update index

-- Result: 97% reduction in index size
-- 95% reduction in write overhead

Strategy 3: Covering Indexes (Index-Only Scans)

If a query is very frequent, include the “payload” data in the index itself using the INCLUDE clause.

Example Query

SELECT email FROM users WHERE id = 5;

Standard Index:

  1. Postgres finds id = 5 in the B-Tree

  2. Jumps to the Heap (Disk) to fetch the email

  3. 2 disk reads

Covering Index:

CREATE INDEX idx_id_email ON users(id) INCLUDE (email);

Now, the email is stored inside the B-Tree leaf node. Postgres answers the query purely from the index without ever touching the main table (Heap). 1 disk read.

When to Use Covering Indexes

Good candidates:

  • Small additional columns (IDs, booleans, short strings)

  • Frequently executed queries

  • Columns already in SELECT but not in WHERE

Bad candidates:

  • Large columns (TEXT, JSON, arrays) — bloats index

  • Rarely queried columns

  • Columns that change frequently (defeats HOT updates)

Real-World Example

-- Common query: Get user email and username by ID
SELECT email, username FROM users WHERE id = ?;

-- Covering index
CREATE INDEX idx_users_lookup ON users(id) INCLUDE (email, username);

-- Performance gain:
-- Before: 2 disk I/O (index + heap)
-- After: 1 disk I/O (index only)
-- 50% reduction in I/O

Verify Index-Only Scan

EXPLAIN (ANALYZE, BUFFERS) 
SELECT email FROM users WHERE id = 12345;

Look for: Index Only Scan in the query plan.


Strategy 4: Composite Index Strategy (Column Order Matters)

The order of columns in a composite index is critical. Postgres can only use an index if your query matches the leftmost prefix.

The Rule: High Selectivity First

-- Good: Status (4 values) then user_id (millions)
CREATE INDEX idx_bad ON orders(status, user_id);

-- Better: user_id (high selectivity) then status
CREATE INDEX idx_good ON orders(user_id, status);

Why? The first column narrows down the search space. If you search by user_id, you immediately eliminate 99.99% of rows. Then filtering by status is cheap.

Leftmost Prefix Rule

CREATE INDEX idx_composite ON orders(user_id, status, created_at);

-- ✅ These queries can use the index:
WHERE user_id = 5
WHERE user_id = 5 AND status = ‘pending’
WHERE user_id = 5 AND status = ‘pending’ AND created_at > ‘2024-01-01’

-- ❌ These queries CANNOT use the index:
WHERE status = ‘pending’  -- Skips user_id
WHERE created_at > ‘2024-01-01’  -- Skips user_id and status
WHERE status = ‘pending’ AND created_at > ‘2024-01-01’  -- Skips user_id

Multi-Index Strategy

If you need to query by different column combinations, create multiple indexes:

-- For: WHERE user_id = ? AND status = ?
CREATE INDEX idx_user_status ON orders(user_id, status);

-- For: WHERE status = ? AND created_at > ?
CREATE INDEX idx_status_date ON orders(status, created_at);

-- For: WHERE created_at > ?
CREATE INDEX idx_created_at ON orders(created_at);

Yes, this increases write overhead. Balance is key.


🔒 Subscribe to read the Advanced Tuning Guide

We have optimized the table layout. Now we need to optimize the cleanup process.

Postgres creates “dead rows.” Something needs to clean them up. That something is VACUUM. If you don’t tune it, your database will bloat until it runs out of disk space.

In the rest of this deep dive, we will cover:

  • Advanced Index Types: When to use BRIN (for massive time-series data) and GIN (for JSONB/Arrays).

  • Vacuum Tuning Checklist: The specific autovacuum settings you need to change to prevent bloat.

  • Bloat Detection: SQL queries to find out which indexes are wasting space.

  • Zero-Downtime Migration: How to use CREATE INDEX CONCURRENTLY safely in production.

Upgrade to Paid to Tune Your Database

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Amit Raghuvanshi · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture