Lädt...

🔧 Databases: Indexes. Part 1


Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to

This article will introduce you to the concept of database indexes. Before diving into indexes, it’s recommended to first read about how databases store data, as this will provide a better understanding of how indexes fit into the overall database architecture.

What Is An Index In a Database?

An index in a database is like the table of contents in a book. It helps the database find information faster, just like a table of contents helps you quickly find a chapter in a book. Normally, when you search for something in a database, it has to scan every row in a table, which takes time. An index is a special structure that keeps track of where data is stored, allowing the database to find it quickly. It works similarly to a library catalog, which helps locate books without checking every shelf, making searches more efficient.

library-catalog

Why Are Indexes Important ?

In database management, indexes are very helpful tools. They make it easier and faster to find and retrieve specific information. Here are some key benefits:

  1. Faster Data Retrieval - when searching for a specific record, an index allows the database to find the result directly, instead of scanning every row. This significantly reduces query time.
  2. Efficient Sorting and Filtering - queries with ORDER BY or WHERE conditions run much faster because the database can use the index instead of sorting or filtering all rows manually.
  3. Better Performance for Large Databases - as data increases, queries can slow down. Indexes help keep search times consistent, even when dealing with millions of records.
  4. Optimized Joins Between Tables - indexes speed up JOIN operations by quickly matching related records between tables, making complex queries more efficient.

These benefits make indexes essential for maintaining the performance and efficiency of databases, especially as they grow larger.

How Indexes Are Stored in a Database

Indexes are stored separately from the main table data. They act as a separate data structure that is linked to the table. An index typically stores a reference to the row in the table and the indexed column's value. Most relational databases use B-trees (Balanced Trees) or hash tables for storing indexes. The structure allows efficient lookups, insertions, and deletions. Here's how it works:

  • B-trees (or B+ trees): These are the most common data structures used for indexing. The tree structure allows the database to maintain sorted values, and the search can quickly navigate the tree in a logarithmic manner, making the lookup faster. The leaf nodes of the tree contain the actual data pointers, allowing for efficient retrieval.

  • Hash Indexes: A hash table uses a hash function to convert the indexed column's value into a fixed-size hash code, which then points to the data rows. This is efficient when dealing with equality searches (WHERE column = value), but it doesn’t support range queries (WHERE column > value).

db-index

Why Indexes Are Fast

The reason why an index speeds up searches in databases is the use of binary search.

Without an index, if you wanted to search for a specific value in a large table, the database would have to scan every row to find the match. This process is known as a full table scan, and it has a time complexity of O(n), where n is the number of rows. With an index, the database doesn’t need to scan every row. It can use binary search to locate the desired data much faster. Binary search works by dividing the data in half repeatedly, narrowing down the search space quickly. Instead of checking every single record, it cuts the search space in half with each step.

For example, in a sorted list, a binary search can locate an element in O(log n) time, where n is the number of items. The larger the data set, the more significant the performance improvement when using an index.

Big O Notation Comparison

- Full Table Scan (No Index): O(n)
Every record is checked one by one. Time increases linearly with the number of records.

- Indexed Search (Using Binary Search): O(log n)
The number of checks decreases exponentially with each step. Even for large datasets, the number of operations remains much lower than a full table scan.

Big O Notation Comparison

As you can see from the graph, the time complexity for a full table scan increases linearly (O(n)) with the size of the dataset, while the time complexity for an indexed search using binary search increases logarithmically (O(log n)).

How Indexes Help with Search in a Library

Imagine you’re in a large library trying to find a specific book. The library has thousands of books, and you’re looking for a book titled “The Art of Programming”. Here’s how an index would help speed up the search:

Without an Index
If the books are randomly arranged on shelves, you’d have to check each book one by one to find the one you’re looking for. This is similar to performing a full table scan. The librarian would start at the first book and keep checking each title until they find “The Art of Programming”. If there are 100,000 books, you might have to check a lot of them before finding your desired book.

👉 Time Complexity: O(n) (You might have to check all the books, linearly).

With an Index
Now imagine the library has a catalog that lists all books alphabetically. The catalog helps you directly jump to the section where books starting with "A" are located. So, instead of scanning the whole library, you can use the catalog to quickly locate where the book should be and directly go to that section. In a database, indexes work the same way. They allow the database to directly jump to the section of the data where the relevant records are located, rather than scanning through the entire table.

Since the catalog is sorted, the librarian can apply binary search to find the book much faster. They wouldn’t check each title individually but would instead look at the middle entry and decide whether the book is before or after that entry. This process repeats, reducing the number of books the librarian needs to check.

👉 Time Complexity: O(log n) (With each step, you halve the number of books you need to check).

This catalog-based approach, powered by binary search, is much faster than checking every single book, just like how an index in a database speeds up queries.

Comparision non indexed and indexed

Why Not Index Every Column?

While indexes are incredibly useful for speeding up searches, creating an index on every column in a database can lead to several issues, making it a bad practice. Let’s explore why:

1. Performance Overhead During Data Insertion/Updates

Indexes need to be updated whenever you add, update, or delete data in a table. The more indexes you have, the more work the database has to do.

Imagine a library where every book is indexed by title, author, genre, and publication year. The title index is like a sorted list of book titles, where each title points to the shelf where the book is located. When a new book arrives, the librarian can’t just put it on a shelf randomly—they first need to insert the book’s title in the correct alphabetical position in the title index, update the author index by adding the book under the right author’s name, place it in the correct category in the genre index, and do the same for the publication year index. The more indexes exist, the more time and effort it takes to properly store the book. Similarly, in a database, each index must be updated when a new row is inserted, which slows down write operations significantly if there are too many indexes.

2. Increased Storage Usage

Each index consumes disk space. The more indexes you create, the more storage is required to maintain them. This can be problematic when dealing with large tables or databases with limited storage.

Imagine a library where, in addition to storing physical books, the librarian also keeps separate catalogs for sorting books by title, author, genre, and publication year. Each of these catalogs takes up space. If the librarian creates a separate catalog for every possible book attribute—such as number of pages, language, publisher, edition, and even cover color—the storage for these catalogs could eventually take up more space than the books themselves! Similarly, in a database, each index requires additional storage. If every column in a large table has an index, the total space used by the indexes could exceed the actual data, leading to inefficient storage usage.

3. Redundant Indexes

Some columns may not need indexing at all, especially if they're rarely used in queries. Having an index on every column would lead to redundant indexes that don’t improve performance and just add unnecessary overhead.

Imagine a library where the librarian creates a separate catalog for every single book detail, including the number of pages, cover color, and even the weight of the book. However, readers never search for books based on their weight or cover color—they only look for title, author, or genre. Maintaining these unnecessary catalogs takes time and space but provides no real benefit. Similarly, in a database, indexing columns that are rarely searched or sorted creates redundant indexes that consume resources without improving performance.

4. Decreased Read Performance

While indexes speed up reads (searches and lookups), having too many indexes can actually hurt read performance in some situations.

Imagine a library with dozens of different catalogs—by title, author, genre, publisher, year, cover color, number of pages, and more. When a reader asks for a book, the librarian must decide which catalog to use. If there are too many catalogs, it takes extra time to check them all and find the best one. Similarly, in a database, when a query is executed, the database must choose the most efficient index. If there are too many indexes, this decision-making process can slow down searches rather than speed them up. Additionally, when new books are added or removed, all these indexes need updating, causing delays and reducing overall performance.

Helpful Links 🤓

Text resources:

Video resources:

...

🔧 Databases: Indexes. Part 2


📈 33.4 Punkte
🔧 Programmierung

🔧 Databases: Indexes. Part 1


📈 33.4 Punkte
🔧 Programmierung

🔧 How to Optimize Search Queries in Large Databases with PostgreSQL and GIN Indexes


📈 27.98 Punkte
🔧 Programmierung

🔧 Designing Databases for Multi-Tenant Systems: Shared vs. Isolated Databases


📈 21.02 Punkte
🔧 Programmierung

🔧 Time-Series Databases vs. Streaming Databases: Key Differences & Use Cases


📈 21.02 Punkte
🔧 Programmierung

🔧 Time-Series Databases vs. Streaming Databases: Key Differences & Use Cases


📈 21.02 Punkte
🔧 Programmierung

🎥 From Vectors to Insights: The Power of Vector Databases #llm #datastorage #databases #ai


📈 21.02 Punkte
🎥 IT Security Video

🔧 Vector Databases vs Graph Databases: Which is Best for Retrieval-Augmented Generation (RAG)?


📈 21.02 Punkte
🔧 Programmierung

🔧 Why are NoSQL Databases beeter at Horizontal Scaling Compared to SQL Databases


📈 21.02 Punkte
🔧 Programmierung

🔧 Comparing Graph Databases with Relational Databases


📈 21.02 Punkte
🔧 Programmierung

🔧 NoSQL Databases vs Graph Databases: Which one should you use?


📈 21.02 Punkte
🔧 Programmierung

🔧 Graph Databases vs Relational Databases: What and why?


📈 21.02 Punkte
🔧 Programmierung

🔧 SQL Indexes


📈 17.47 Punkte
🔧 Programmierung

🔧 Effortless Log Management: Using TTL Indexes in MongoDB to Automate Data Cleanup


📈 17.47 Punkte
🔧 Programmierung

🎥 How Resumable Indexes in SQL Server 2019 Makes Your Job Easier | Data Exposed: MVP Edition


📈 17.47 Punkte
🎥 Video | Youtube

🔧 How to Reorganize and Rebuild Indexes in MS SQL?


📈 17.47 Punkte
🔧 Programmierung

🔧 Range indexes for LIKE queries in YugabyteDB


📈 17.47 Punkte
🔧 Programmierung

🔧 Real-time examples of clustered and non-clustered indexes


📈 17.47 Punkte
🔧 Programmierung

🔧 A Practical Guide to PostgreSQL Indexes


📈 17.47 Punkte
🔧 Programmierung

🔧 How to use composite indexes and correlated subqueries with Azure Cosmos DB | Azure Friday


📈 17.47 Punkte
🔧 Programmierung

🔧 How Database Indexes Improve SQL Query Performance


📈 17.47 Punkte
🔧 Programmierung

🔧 Dica C#: Ranges e Indexes


📈 17.47 Punkte
🔧 Programmierung

📰 Secure your Amazon Kendra indexes with the ACL using a JWT shared secret key


📈 17.47 Punkte
🔧 AI Nachrichten

🔧 Optimizing PostgreSQL Queries With Indexes: A Practical Deep Dive


📈 17.47 Punkte
🔧 Programmierung

🔧 Indexes in SQL | Clustered and Non Clustered Index


📈 17.47 Punkte
🔧 Programmierung

📰 Pandas Indexes And Headers, Have You Ever Been Confused?


📈 17.47 Punkte
🔧 AI Nachrichten

📰 Three ways to leverage composite indexes in Azure Cosmos DB


📈 17.47 Punkte
📰 IT Nachrichten

🎥 MariaDB Indexes Revisited: Beyond the Documentation


📈 17.47 Punkte
🎥 Videos

🔧 Ultimate Guide to SQL Indexes


📈 17.47 Punkte
🔧 Programmierung

🔧 Take This Unique Quiz About Duplicate Indexes In MySQL | pt-duplicate-key-checker


📈 17.47 Punkte
🔧 Programmierung

🔧 Optimizing PostgreSQL Queries With Indexes: A Deep Dive


📈 17.47 Punkte
🔧 Programmierung

🔧 Searching Items in Sitecore without Solr Indexes


📈 17.47 Punkte
🔧 Programmierung

matomo