Database Indexing and Query Optimization for Beginners

As a junior developer, few things feel as satisfying as writing a feature that works. However, as your application grows, you might notice that a page load that used to take 200ms now takes 5 seconds. Usually, the culprit is the database.

In this guide, we will break down how database indexing works, how to understand what your database is actually doing, and practical tips to make your queries fly.

What is a Database Index?

Imagine you are holding a 1,000-page textbook on Biology, and I ask you to find the definition of "Mitochondria." Without an index, you would have to flip through every single page, reading line by line, until you found it. In database terms, this is called a Full Table Scan, and it has a time complexity of O(N).

Now, imagine using the index at the back of the book. You look up "M", find "Mitochondria," and it gives you the specific page number. You jump straight there. This is an Index Scan, and it usually operates with a time complexity of O(log N).

Under the hood, most relational databases (like PostgreSQL, MySQL, or SQL Server) use a data structure called a B-Tree to store indexes. This structure keeps data sorted and allows the database engine to traverse a tree-like structure to find specific pointers to the actual data rows efficiently.

The Trade-off: Read vs. Write

You might ask, "If indexes make reads so fast, why don't we index every single column?"

Indexes are not free. Every time you write to a table (INSERT, UPDATE, or DELETE), the database must:

Update the actual table data.
Update every index associated with that table to keep the B-Tree balanced.

If you have 10 indexes on a table, a single insert triggers 11 write operations. Therefore, the goal is to index columns that are frequently used in WHERE, JOIN, and ORDER BY clauses, while avoiding over-indexing tables that are write-heavy.

How to Read a Query Plan

To optimize a query, you need to know how the database intends to execute it. We do this using the EXPLAIN command. While the output varies by database engine, the concepts remain the same.

Here is a simple example in SQL:

EXPLAIN SELECT * FROM users WHERE email = 'jane@example.com';

When you run this, look for these keywords in the output:

Seq Scan / Full Table Scan: The database is reading every row. This is generally bad for large tables.
Index Scan / Index Seek: The database is using an index to pinpoint data. This is good.
Key Lookup / Heap Fetch: The index told the database where the row is, but it had to go back to the main table to fetch the rest of the data (like the user's name).

If you see a Seq Scan on a table with millions of rows for a specific column, that column is a prime candidate for an index.

Practical Optimization Tips

Here are three actionable tips to speed up your queries immediately.

1. The Left-Most Prefix Rule

When you create a Composite Index (an index on multiple columns), order matters. If you create an index on (lastname, firstname), the database can use the index for:

Queries filtering by lastname
Queries filtering by lastname AND firstname

However, it cannot use the index effectively if you only filter by firstname. Think of it like a phone book: it's sorted by Last Name, then First Name. You can't easily find all people named "David" because they are scattered under every different Last Name.

2. Make Queries SARGable

SARGable stands for Search ARGument able. This essentially means writing queries that allow the engine to use the index.

The Trap: applying a function to a column.

-- BAD: The database must run the YEAR() function on every row to check the condition.
-- This prevents the use of an index on the 'created_at' column.
SELECT * FROM orders WHERE YEAR(created_at) = 2023;

The Fix: Compare the raw column against a range.

-- GOOD: The database can look up the range directly in the index.
SELECT * FROM orders 
WHERE created_at >= '2023-01-01' 
AND created_at < '2024-01-01';

3. Select Only What You Need

Avoid SELECT * in production. If you have an index on email, but you request SELECT *, the database finds the email in the index but then must perform a lookup in the main storage to get the other columns (address, phone, bio).

If you only need the ID, running SELECT id FROM users WHERE email = '...' allows the database to perform an Index Only Scan, where it gets all the data it needs directly from the index tree without ever touching the main table storage. This is significantly faster.

Summary

Indexes are like book indexes; they speed up reads but slow down writes.
Use EXPLAIN to check if your query is scanning the whole table.
Ensure your queries are SARGable by avoiding functions on the left side of your comparison.

Mastering these basics will put you ahead of the curve and keep your applications running smoothly as they scale.