JavaScript Data Structures: Hash Table: Collisions

2020-02-19

Intro 🌐

Last time, we learned what a Hash Function is and how to write a simple one.

Today, we'll learn how to handle collisions.

Problem 😔

Let's modify our old example. We write a friends application and want to save their name and if they are mean.

const friend = {
  name: "Peter",
  mean: false,
};

We use our selfmade (and bad) hash function:

const hash = (key) => {
  const chars = key.split("");
  const charCodes = chars.map((char) => char.charCodeAt());
  const charCodeSum = charCodes.reduce((acc, cur) => acc + cur);
  return charCodeSum;
};

Result:

hash("name"); // 417
hash("mean"); // 417

Shit. Two different keys have the same hash and therefore would get stored to the same index 417 of our array.

Solution 😌

Let's think about a real-life problem.

We go to the cinema and buy two tickets, I get seat 417, you also get seat 417.

How would we solve this seat collision without going back to the ticket seller?

Separate Chaining: Because the seat is really big, we can share the same seat.
Linear Probing / Open Addressing: One of us takes seat 417, the other one takes the next free seat, e.g. 418.

Separate Chaining ⛓

If we get the same hash, we store our data at the same index, but chained in a new data structure, e.g. another array.

Storing:

We hash name and get index 417
We store ["name", "Peter"] at index 417
We hash mean and get index 417, too
Because this array item is already filled with ["name", "Peter"], we create a new array around the existing item
We add ["mean", false] into the existing array at index 417
Result: an array in an array: [ ["name", "Peter"], ["mean", false] ]

Searching mean:

We hash it and get index 417
We look at index 417: [ ["name", "Peter"], ["mean", false] ]
We iterate over this array until we find mean

Linear Probing / Open Addressing 📖

If we get the same hash, we search for the next empty array index.

Storing:

We hash name and get index 417
We store ["name", "Peter"] at index 417
We hash mean and get index 417
Because this array item is already filled with ["name", "Peter"], we search for the next empty index, e.g. 418
We store ["mean", false] at index 418

Searching mean:

We hash it, get index 417
We look at index 417: ["name", "Peter"], but that's not the data we want
We go to the next array item at index 418 and there is our ["mean", false]

Note: This example uses linear probing, meaning steps are fixed, e.g. we increase the index by 1 when we search for a next free index. Open Addressing is a broader term for this method, you can read about it here.

Why we don't want to have collisions 💭

With both methods of handling the collision, Separate Chaining and Linear Probing, we would have to iterate over stored data and check if this is our desired data. The more collisions we have, the bigger the data to iterate over, the longer it takes to find our data.

Next Part ➡️

We managed to learn the theory, great work! We will start to implement our Hash Table.

Need some mentoring? Click here!

Questions ❔

Which method for handling collisions is easier to implement?
Which method would you use? Why?
What are the (dis)advantages of both methods?
Can you come up with other methods to handle collisions?

miku86