Intro 🌐
Last time, we learned what a Hash Function is and how to write a simple one.
Today, we'll learn how to handle collisions.
Problem 😔
Let's modify our old example.
We write a friends application and want to save their name
and if they are mean
.
const friend = {
name: "Peter",
mean: false,
};
We use our selfmade (and bad) hash function:
const hash = (key) => {
const chars = key.split("");
const charCodes = chars.map((char) => char.charCodeAt());
const charCodeSum = charCodes.reduce((acc, cur) => acc + cur);
return charCodeSum;
};
Result:
hash("name"); // 417
hash("mean"); // 417
Shit. Two different keys have the same hash and therefore would get stored to the same index 417
of our array.
Solution 😌
Let's think about a real-life problem.
We go to the cinema and buy two tickets, I get seat 417
, you also get seat 417
.
How would we solve this seat collision without going back to the ticket seller?
- Separate Chaining: Because the seat is really big, we can share the same seat.
- Linear Probing / Open Addressing: One of us takes seat
417
, the other one takes the next free seat, e.g.418
.
Separate Chaining ⛓
If we get the same hash, we store our data at the same index, but chained in a new data structure, e.g. another array.
Storing:
- We hash
name
and get index417
- We store
["name", "Peter"]
at index417
- We hash
mean
and get index417
, too - Because this array item is already filled with
["name", "Peter"]
, we create a new array around the existing item - We add
["mean", false]
into the existing array at index417
- Result: an array in an array:
[ ["name", "Peter"], ["mean", false] ]
Searching mean
:
- We hash it and get index
417
- We look at index
417
:[ ["name", "Peter"], ["mean", false] ]
- We iterate over this array until we find
mean
Linear Probing / Open Addressing 📖
If we get the same hash, we search for the next empty array index.
Storing:
- We hash
name
and get index417
- We store
["name", "Peter"]
at index417
- We hash
mean
and get index417
- Because this array item is already filled with
["name", "Peter"]
, we search for the next empty index, e.g.418
- We store
["mean", false]
at index418
Searching mean
:
- We hash it, get index
417
- We look at index
417
:["name", "Peter"]
, but that's not the data we want - We go to the next array item at index
418
and there is our["mean", false]
Note: This example uses linear probing, meaning steps are fixed, e.g. we increase the index by 1 when we search for a next free index. Open Addressing is a broader term for this method, you can read about it here.
Why we don't want to have collisions 💭
With both methods of handling the collision, Separate Chaining and Linear Probing, we would have to iterate over stored data and check if this is our desired data. The more collisions we have, the bigger the data to iterate over, the longer it takes to find our data.
Next Part ➡️
We managed to learn the theory, great work! We will start to implement our Hash Table.
Need some mentoring? Click here!
Further Reading 📖
Questions ❔
- Which method for handling collisions is easier to implement?
- Which method would you use? Why?
- What are the (dis)advantages of both methods?
- Can you come up with other methods to handle collisions?