Chapter 23: Advanced Data Structures
Advanced data structures are crucial for solving complex computational problems efficiently. They allow us to manage data in a way that reduces the time complexity of operations such as searching, updating, and querying large datasets. In this chapter, we will focus on three key advanced data structures:
- Hash Tables
- Tries
- Segment Trees and Fenwick Trees
Each of these structures is designed to address different computational challenges and is widely used in competitive programming and real-world applications. Let’s explore these structures in detail.
23.1 Hash Tables
A Hash Table is a data structure that provides an efficient way to store and retrieve data. It uses a hash function to map keys to values, enabling constant time average complexity for operations such as insertion, deletion, and search.
23.1.1 The Structure of Hash Tables
A hash table consists of an array, where each position corresponds to an index produced by a hash function. The hash function converts a key (usually a string or an integer) into an index that maps to the array position where the associated value is stored.
- Hash Function: A function that converts a key into an index.
- Collisions: When two different keys produce the same index, a collision occurs. Hash tables handle collisions using two primary methods:
- Chaining: Multiple key-value pairs are stored in a list at the same index.
- Open Addressing: The hash table searches for the next available index.
23.1.2 Operations in Hash Tables
- Insert(key, value): Insert a key-value pair into the hash table.
- Search(key): Retrieve the value associated with the key.
- Delete(key): Remove the key-value pair from the hash table.
Example Implementation of a Hash Table
Here’s a simple implementation of a hash table using chaining in C.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLE_SIZE 10
struct Node {
int key;
int value;
struct Node* next;
};
struct Node* hashTable[TABLE_SIZE];
int hashFunction(int key) {
return key % TABLE_SIZE;
}
void insert(int key, int value) {
int hashIndex = hashFunction(key);
struct Node* newNode = (struct Node*) malloc(sizeof(struct Node));
newNode->key = key;
newNode->value = value;
newNode->next = hashTable[hashIndex];
hashTable[hashIndex] = newNode;
}
int search(int key) {
int hashIndex = hashFunction(key);
struct Node* node = hashTable[hashIndex];
while (node != NULL) {
if (node->key == key) {
return node->value;
}
node = node->next;
}
return -1;
}
void delete(int key) {
int hashIndex = hashFunction(key);
struct Node* node = hashTable[hashIndex];
struct Node* prev = NULL;
while (node != NULL && node->key != key) {
prev = node;
node = node->next;
}
if (node == NULL) return;
if (prev == NULL) {
hashTable[hashIndex] = node->next;
} else {
prev->next = node->next;
}
free(node);
}
23.2 Tries (Prefix Trees)
A Trie (pronounced “try”) is a tree-like data structure that is used for storing strings efficiently. It is especially useful for scenarios where we need to quickly search for strings or prefixes of strings, such as in dictionary implementations and autocomplete systems.
23.2.1 Structure of a Trie
A Trie stores strings character by character. Each node in the Trie represents a character, and a path from the root to a node represents a string or a prefix. The Trie allows us to check whether a string exists, whether a prefix exists, and retrieve all strings that start with a given prefix.
Each node of a Trie has:
- Children: References to its child nodes, representing the next characters.
- End of Word Marker: A boolean value indicating if the node represents the end of a word.
23.2.2 Trie Operations
- Insert(word): Inserts a word into the Trie.
- Search(word): Checks if the word is present in the Trie.
- Prefix Search(prefix): Checks if there is any word in the Trie that starts with the given prefix.
Example Implementation of a Trie
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ALPHABET_SIZE 26
struct TrieNode {
struct TrieNode* children[ALPHABET_SIZE];
int isEndOfWord;
};
struct TrieNode* getNode() {
struct TrieNode* node = (struct TrieNode*) malloc(sizeof(struct TrieNode));
node->isEndOfWord = 0;
for (int i = 0; i < ALPHABET_SIZE; i++) {
node->children[i] = NULL;
}
return node;
}
void insert(struct TrieNode* root, const char* key) {
struct TrieNode* current = root;
for (int i = 0; i < strlen(key); i++) {
int index = key[i] - 'a';
if (!current->children[index]) {
current->children[index] = getNode();
}
current = current->children[index];
}
current->isEndOfWord = 1;
}
int search(struct TrieNode* root, const char* key) {
struct TrieNode* current = root;
for (int i = 0; i < strlen(key); i++) {
int index = key[i] - 'a';
if (!current->children[index]) {
return 0;
}
current = current->children[index];
}
return current->isEndOfWord;
}
23.3 Segment Trees and Fenwick Trees
23.3.1 Segment Trees
A Segment Tree is a data structure used for efficiently performing range queries and updates on an array. For example, if we want to calculate the sum, minimum, or maximum of elements in a range of an array and frequently update the array values, a segment tree provides a more efficient solution compared to a simple array.
A segment tree divides the array into segments, and each node in the tree stores information about a specific range. It allows range queries and point updates in O(log n) time.
Operations on Segment Tree
- Build Tree: Construct the segment tree from an array.
- Range Query: Query a range for sum, minimum, maximum, etc.
- Update: Update an element in the array and reflect the changes in the segment tree.
Example Implementation of a Segment Tree
#include <stdio.h>
int segmentTree[1000];
void buildTree(int arr[], int start, int end, int node) {
if (start == end) {
segmentTree[node] = arr[start];
} else {
int mid = (start + end) / 2;
buildTree(arr, start, mid, 2 * node + 1);
buildTree(arr, mid + 1, end, 2 * node + 2);
segmentTree[node] = segmentTree[2 * node + 1] + segmentTree[2 * node + 2];
}
}
int rangeQuery(int start, int end, int l, int r, int node) {
if (l > end || r < start) {
return 0;
}
if (l <= start && r >= end) {
return segmentTree[node];
}
int mid = (start + end) / 2;
return rangeQuery(start, mid, l, r, 2 * node + 1) +
rangeQuery(mid + 1, end, l, r, 2 * node + 2);
}
23.3.2 Fenwick Trees (Binary Indexed Trees)
A Fenwick Tree or Binary Indexed Tree (BIT) is a data structure used for efficiently performing prefix sum queries and point updates. It is easier to implement than a segment tree and is often used in problems where frequent updates and cumulative frequency queries are required.
The Fenwick Tree allows prefix sum queries in O(log n) time and updates the values in O(log n) time.
Operations in Fenwick Tree
- Update: Increment the value at a specific index.
- Prefix Sum: Compute the sum of the array from the start up to a given index.
Example Implementation of a Fenwick Tree
#include <stdio.h>
int BIT[1000];
void update(int n, int index, int value) {
while (index <= n) {
BIT[index] += value;
index += index & (-index);
}
}
int query(int index) {
int sum = 0;
while (index > 0) {
sum += BIT[index];
index -= index & (-index);
}
return sum;
}
Conclusion
This chapter explored three advanced data structures that are essential for solving a wide range of computational problems:
- Hash Tables: Provide constant-time average complexity for insertion, search, and deletion.
- Tries: Efficiently store and search for strings and prefixes.
- Segment and Fenwick Trees: Offer powerful range query and update capabilities.
Each of these structures has its unique use cases and is invaluable in different problem-solving scenarios.