PostgreSQL 🐘 is a robust database, but its MVCC (Multi-Version Concurrency Control) design can sometimes lead to a subtle issue: table and index bloat. If left unchecked, bloat wastes storage, increases I/O, and can slow queries down.

In this post, I’ll walk through:

  • What bloat is and why it happens
  • Different ways to measure it
  • How to keep an eye on it without killing your database
  • A few habits to reduce or fix it

What is bloat? 🧹

  • Table bloat: Every update creates a new row version, and the old one becomes dead. Those dead tuples sit there until vacuum cleans them.
  • Index bloat: Even when dead tuples are gone from the table, their index entries may stick around until an index vacuum or reindex.

Why care?

  • Bigger indexes mean slower lookups. Bigger indexes = slower index scans (more pages to read).
  • Larger tables mean more I/O. Bigger tables = more disk I/O and less cache efficiency.
  • Wasted pages reduce cache efficiency. More bloat = more work for autovacuum.

A little is fine. A lot can hurt.

Ways to measure bloat 📐

1. ioguix scripts (detailed but heavy)

The ioguixpgsql-bloat-estimation repo has two well-known queries:

They’re pretty accurate, but on a big database they can be very slow and eating too much resource.

2. Maxim Boguk’s formula (fast heuristic)

Described in pg_index_pilot. It’s not exact, but it’s fast enough to use in monitoring.

Formula:

bloat_indicator = index_size / reltuples

Here’s SQL you can run right away (no extensions needed):

SELECT 
    i.schemaname,
    c.relname AS table_name,
    i.indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    c.reltuples::bigint AS row_estimate,
    CASE 
        WHEN c.reltuples > 0 
        THEN pg_relation_size(i.indexrelid) / c.reltuples 
        ELSE NULL 
    END AS bytes_per_row
FROM pg_stat_all_indexes i
JOIN pg_class c ON c.oid = i.relid
ORDER BY bytes_per_row DESC NULLS LAST
LIMIT 20;
  • bytes_per_row acts as the Boguk bloat indicator.
  • The higher it is compared to the average row size, the more bloated the index might be.
  • This is not exact, but it’s fast and lightweight.

Boguk’s bloat indicator is not measured in bytes or per cents. It is to be used in relative scenario: first, we measure the “ideal” value – the value of freshly built index. And then, we observe how the value changes over time – if it significantly bigger than the “ideal” one, it is time to reindex.

3. pgstattuple (precise but slower)

For deeper inspection, the pgstattuple extension can scan indexes or tables directly:

CREATE EXTENSION IF NOT EXISTS pgstattuple;
SELECT * FROM pgstattuple('my_index');

It’s more precise, but scans every page, requires superuser and scans the whole object (can be slow).

Monitoring bloat over time 📊

A couple of lightweight checks go a long way:

  • Watch pg_stat_all_tables.n_dead_tup for dead tuples.
  • Track index sizes with pg_relation_size().
  • Use the Boguk formula as a baseline trend.
  • Occasionally validate with pgstattuple.

There are tools too:

Preventing and fixing bloat 🛠️

From experience, a mix of these helps:

  • Autovacuum tuning: Lower thresholds on busy tables so cleanup kicks in earlier.
  • Fillfactor: For frequently updated tables, setting a lower fillfactor (say 80–90%) leaves room for HOT updates.
  • Reindexing: Use REINDEX CONCURRENTLY (Postgres ≥ 12) to shrink indexes without downtime.
  • Partitioning: Breaking a giant write-heavy table into partitions can keep bloat local.
  • Drop unused indexes: If pg_stat_user_indexes.idx_scan = 0 for months, it’s a candidate for removal. Can easily win 40%.

How much does it matter? (🚦 rule of thumb)

  • <20–30% bloat → probably fine, not urgent.
  • > 50% bloat on large tables/indexes → worth attention, because performance will likely degrade.

The key is to watch trends, not just snapshots. A sudden growth spike is often more telling than a single measurement.

Final thoughts

  • Use Boguk’s formula for fast monitoring.
  • Use pgstattuple or ioguix scripts for precise checks.
  • Prevent bloat with autovacuum tuning, reindexing, and dropping unused indexes.

References 📚