🔧 Databases: Indexes. Part 1
Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to
This article will introduce you to the concept of database indexes. Before diving into indexes, it’s recommended to first read about how databases store data, as this will provide a better understanding of how indexes fit into the overall database architecture.
What Is An Index In a Database?
An index in a database is like the table of contents in a book. It helps the database find information faster, just like a table of contents helps you quickly find a chapter in a book. Normally, when you search for something in a database, it has to scan every row in a table, which takes time. An index is a special structure that keeps track of where data is stored, allowing the database to find it quickly. It works similarly to a library catalog, which helps locate books without checking every shelf, making searches more efficient.
Why Are Indexes Important ?
In database management, indexes are very helpful tools. They make it easier and faster to find and retrieve specific information. Here are some key benefits:
- Faster Data Retrieval - when searching for a specific record, an index allows the database to find the result directly, instead of scanning every row. This significantly reduces query time.
- Efficient Sorting and Filtering - queries with ORDER BY or WHERE conditions run much faster because the database can use the index instead of sorting or filtering all rows manually.
- Better Performance for Large Databases - as data increases, queries can slow down. Indexes help keep search times consistent, even when dealing with millions of records.
- Optimized Joins Between Tables - indexes speed up JOIN operations by quickly matching related records between tables, making complex queries more efficient.
These benefits make indexes essential for maintaining the performance and efficiency of databases, especially as they grow larger.
How Indexes Are Stored in a Database
Indexes are stored separately from the main table data. They act as a separate data structure that is linked to the table. An index typically stores a reference to the row in the table and the indexed column's value. Most relational databases use B-trees (Balanced Trees) or hash tables for storing indexes. The structure allows efficient lookups, insertions, and deletions. Here's how it works:
B-trees (or B+ trees): These are the most common data structures used for indexing. The tree structure allows the database to maintain sorted values, and the search can quickly navigate the tree in a logarithmic manner, making the lookup faster. The leaf nodes of the tree contain the actual data pointers, allowing for efficient retrieval.
Hash Indexes: A hash table uses a hash function to convert the indexed column's value into a fixed-size hash code, which then points to the data rows. This is efficient when dealing with equality searches (WHERE column = value), but it doesn’t support range queries (WHERE column > value).
Why Indexes Are Fast
The reason why an index speeds up searches in databases is the use of binary search.
Without an index, if you wanted to search for a specific value in a large table, the database would have to scan every row to find the match. This process is known as a full table scan, and it has a time complexity of O(n), where n is the number of rows. With an index, the database doesn’t need to scan every row. It can use binary search to locate the desired data much faster. Binary search works by dividing the data in half repeatedly, narrowing down the search space quickly. Instead of checking every single record, it cuts the search space in half with each step.
For example, in a sorted list, a binary search can locate an element in O(log n) time, where n is the number of items. The larger the data set, the more significant the performance improvement when using an index.
Big O Notation Comparison
- Full Table Scan (No Index): O(n)
Every record is checked one by one. Time increases linearly with the number of records.
- Indexed Search (Using Binary Search): O(log n)
The number of checks decreases exponentially with each step. Even for large datasets, the number of operations remains much lower than a full table scan.
As you can see from the graph, the time complexity for a full table scan increases linearly (O(n)) with the size of the dataset, while the time complexity for an indexed search using binary search increases logarithmically (O(log n)).
How Indexes Help with Search in a Library
Imagine you’re in a large library trying to find a specific book. The library has thousands of books, and you’re looking for a book titled “The Art of Programming”. Here’s how an index would help speed up the search:
Without an Index
If the books are randomly arranged on shelves, you’d have to check each book one by one to find the one you’re looking for. This is similar to performing a full table scan. The librarian would start at the first book and keep checking each title until they find “The Art of Programming”. If there are 100,000 books, you might have to check a lot of them before finding your desired book.
👉 Time Complexity: O(n) (You might have to check all the books, linearly).
With an Index
Now imagine the library has a catalog that lists all books alphabetically. The catalog helps you directly jump to the section where books starting with "A" are located. So, instead of scanning the whole library, you can use the catalog to quickly locate where the book should be and directly go to that section. In a database, indexes work the same way. They allow the database to directly jump to the section of the data where the relevant records are located, rather than scanning through the entire table.
Since the catalog is sorted, the librarian can apply binary search to find the book much faster. They wouldn’t check each title individually but would instead look at the middle entry and decide whether the book is before or after that entry. This process repeats, reducing the number of books the librarian needs to check.
👉 Time Complexity: O(log n) (With each step, you halve the number of books you need to check).
This catalog-based approach, powered by binary search, is much faster than checking every single book, just like how an index in a database speeds up queries.
Why Not Index Every Column?
While indexes are incredibly useful for speeding up searches, creating an index on every column in a database can lead to several issues, making it a bad practice. Let’s explore why:
1. Performance Overhead During Data Insertion/Updates
Indexes need to be updated whenever you add, update, or delete data in a table. The more indexes you have, the more work the database has to do.
Imagine a library where every book is indexed by title, author, genre, and publication year. The title index is like a sorted list of book titles, where each title points to the shelf where the book is located. When a new book arrives, the librarian can’t just put it on a shelf randomly—they first need to insert the book’s title in the correct alphabetical position in the title index, update the author index by adding the book under the right author’s name, place it in the correct category in the genre index, and do the same for the publication year index. The more indexes exist, the more time and effort it takes to properly store the book. Similarly, in a database, each index must be updated when a new row is inserted, which slows down write operations significantly if there are too many indexes.
2. Increased Storage Usage
Each index consumes disk space. The more indexes you create, the more storage is required to maintain them. This can be problematic when dealing with large tables or databases with limited storage.
Imagine a library where, in addition to storing physical books, the librarian also keeps separate catalogs for sorting books by title, author, genre, and publication year. Each of these catalogs takes up space. If the librarian creates a separate catalog for every possible book attribute—such as number of pages, language, publisher, edition, and even cover color—the storage for these catalogs could eventually take up more space than the books themselves! Similarly, in a database, each index requires additional storage. If every column in a large table has an index, the total space used by the indexes could exceed the actual data, leading to inefficient storage usage.
3. Redundant Indexes
Some columns may not need indexing at all, especially if they're rarely used in queries. Having an index on every column would lead to redundant indexes that don’t improve performance and just add unnecessary overhead.
Imagine a library where the librarian creates a separate catalog for every single book detail, including the number of pages, cover color, and even the weight of the book. However, readers never search for books based on their weight or cover color—they only look for title, author, or genre. Maintaining these unnecessary catalogs takes time and space but provides no real benefit. Similarly, in a database, indexing columns that are rarely searched or sorted creates redundant indexes that consume resources without improving performance.
4. Decreased Read Performance
While indexes speed up reads (searches and lookups), having too many indexes can actually hurt read performance in some situations.
Imagine a library with dozens of different catalogs—by title, author, genre, publisher, year, cover color, number of pages, and more. When a reader asks for a book, the librarian must decide which catalog to use. If there are too many catalogs, it takes extra time to check them all and find the best one. Similarly, in a database, when a query is executed, the database must choose the most efficient index. If there are too many indexes, this decision-making process can slow down searches rather than speed them up. Additionally, when new books are added or removed, all these indexes need updating, causing delays and reducing overall performance.
Helpful Links 🤓
Text resources:
- Introduction To Database Indexing
- How Does Database Indexing Work?
- An in-depth look at Database Indexing
Video resources:
- what is a database index?
- Big-O Notation in 100 Seconds
- SQL Indexes Explained in 20 Minutes
- Database Indexing for Dumb Developers
- Binary Search Algorithm in 100 Seconds
- SQL indexing best practices | How to make your database FASTER!
- Database Design 39 - Indexes (Clustered, Nonclustered, Composite Index)
🔧 Databases: Indexes. Part 2
📈 33.4 Punkte
🔧 Programmierung
🔧 Databases: Indexes. Part 1
📈 33.4 Punkte
🔧 Programmierung
🔧 SQL Indexes
📈 17.47 Punkte
🔧 Programmierung
🔧 Range indexes for LIKE queries in YugabyteDB
📈 17.47 Punkte
🔧 Programmierung
🔧 A Practical Guide to PostgreSQL Indexes
📈 17.47 Punkte
🔧 Programmierung
🔧 Dica C#: Ranges e Indexes
📈 17.47 Punkte
🔧 Programmierung
🎥 MariaDB Indexes Revisited: Beyond the Documentation
📈 17.47 Punkte
🎥 Videos
🔧 Ultimate Guide to SQL Indexes
📈 17.47 Punkte
🔧 Programmierung