Hashing

Introduction

Hashing in a simpler form is basically mapping the integer directly to it's own index (rather than storing it and assigning it a index). Hence why a large integer like 50000 would cause a lot of space to be required.

Since data (hashed to an integer) and index are both the same, data can be checked for existence in O(1) time

Keys are referred to as "Key space" while the table we are storing the data in as "Hash Table"

Hashing as a function

Ideal Hash function can be represented mathematically h(x) = x though this wastes a lot of space hence an "optimized" hash function should be used. An example optimized function can be h(x) = x % (size) (size refers to hash table size).

Collisions

When using an optimized hash function, it can lead to "collisions" or multiple different data trying to be stored at the same location.

Collision Resolution

These methods can increase time complexity, however, they still perform a lot better than O(log(n)) complexity (refering to binary search complexity)

1. Open Hashing (or Chaining)

At each location rather than storing the data directly, a LinkedList is stored instead. Whenever a collision occurs, a new element is added to this LinkedList.

Function: h(x) = x % (size)

Pros: (Pros/Cons are self notes, not discussed in video)

No element clustering issue

Cons:

Time complexity increases due to linear list traversing
Space complexity increases due to initiation of new LinkedList at each index

Searching Method: Solve Hashing function to reach required index, traverse LinkedList till you find the element of the LinkedList ends. You find the element accordingly

2. Linear Probing

At each location data is stored normally until a collision occurs. If a collision occurs, f(x) acquires the next value of i , hence changing the hash function's output index. In case of repeated collisions, i is incremented until collision stops.

Function: h(x) = ( h(x) + f(i) ) % (size) where f(i) = i and i = 0, 1, 2, 3, 4, ...

Cons:

Clustering of elements can occur due to linear offset function, this can cause unnecessary checking of elements which otherwise shouldn't need to be checked

Searching Method: Solve Hashing function to reach required index, if data is found at that index then stop searching otherwise increment the value of i until either data is found or an empty/null location is found, If null location occurs, data is not present in Hash table.

3. Quadratic Probing

This is really similar to linear probing however the offset function is a quadratic, this is done so as to avoid the clustering issue where elements were being stored too close to each other. How is clustering bad? Basically when searching in linear probing, you have to increment to the next value of offset function until either the data or an empty location is found. In case of the data not being present in the hash table, the search function ends up searching more locations than required due to clustering of elements together (since the function can only stop until an empty location is found).

Hence to resolve this issue, the offset function is made a quadratic, which causes increased space between the storing of colliding data, this helps to prevent clustering of data (hence the search function reaches an empty space faster in case of data not being present in the hash table)

Function: h(x) = ( h(x) + f(i) ) % (size) where f(i) = i^2 and i = 0, 1, 2, 3, 4, ...

Searching Method: Same as Linear Probing, solve hash function and increment value of i until either the data is found or an empty space is found.

Associated Data Structures

HashSet

PreviousInsertionSort

Last updated 2 years ago

hashtagIntroduction

hashtagHashing as a function

hashtagCollisions

hashtagCollision Resolution

hashtag1. Open Hashing (or Chaining)

hashtag2. Linear Probing

hashtag3. Quadratic Probing

hashtagAssociated Data Structures