set5.png
profile picture Preetam Sarkar
6 min read Dec 16, 2022

Why Use Set for Search Operation?

After reading this blog you will surely learn how to optimize your code and increase the performance for search operation.

Here we have used Javascript as our preferred language. Note : Install Node.js

We will learn about Array.includes() and Set.has(), both are used for searching a specific value in an Array and Set respectively.

Two important points we need to know before we proceed :
  • Array.includes() has a time complexity of O(n).
  • Set.has() has a time complexity of O(1).

Using Set.has() can drastically enhance the code performance and improve the time complexity.

Let’s jump into the example to understand it more clearly

In this example, we will demonstrate with real world logical solution. We will use both Array.includes() and Set.has() and compare the difference to see the impact.

Note : Let us consider we have a database which contains user information, we will retrieve all the user info from the database and we will filter out the info for the users who are present in an array. So users have two attributes in database id and email.

So for implementing this logic in our local machine with large data to see the impact on performance and time complexity we will :
  • Create a new NodeJS project by creating new directory.
  • Create three files inside the directory, index.js which will be our main file where we will write all our code, generate_dummy_data.js which will contains scripts to generate dummy data, data.json this json file will be our local database.

Let’s take various number of data to verify the impact in time.

Now, for 100K users in DB and filter for 100 users

Add this lines of code in generate_dummy_data.js in order to generate dummy data in data.json file to work.

const fs = require("fs");
const users = [];
for(let start=1;start<=100000;start++){
    let user = {"id":start, "email": `random${start}@gmail.com`};
    users.push(user);
}
fs.writeFileSync("./data.json",`{"users":${JSON.stringify(users)}}`);

Now run the following command in the terminal to run dummy.js

node generate_dummy_data.js

After running the script, data.json file contains 100k dummy users data. Now, paste the following code in your index.js file.

const {users} = require("./data.json");

/* activeUsers variable contains an array of active user 
for whom we want to filter the data
*/ 
const activeUsers = new Array(100);

for(let start=1;start<=100;start++){
    activeUsers[start]=start;
}
// using Set Data Structure to store the activeUsers id in a Set
const onlineUserSet = new Set(activeUsers);

console.time("Time taken for Array.includes()");
users.filter((user)=>{
    return  activeUsers.includes(user.id)
});
console.timeEnd("Time taken for Array.includes()");

console.time("Time taken for Set.has()");
users.filter((user)=>{
    return  onlineUserSet.has(user.id)
});
console.timeEnd("Time taken for Set.has()")

Now run the following command in the terminal to run index.js

node index.js

Output :

Time taken for Array.includes(): 9.314ms
Time taken for Set.has(): 3.697ms

Now, for 100K users in DB and filter for 1000 users

Now, change the value from 100 to 1000 in the for loop in index.js file.

const {users} = require("./data.json");

/* activeUsers variable contains an array of active user 
for whom we want to filter the data
*/ 
const activeUsers = new Array(1000);

for(let start=1;start<=1000;start++){
    activeUsers[start]=start;
}
// using Set Data Structure to store the activeUsers id in a Set
const onlineUserSet = new Set(activeUsers);

console.time("Time taken for Array.includes()");
users.filter((user)=>{
    return  activeUsers.includes(user.id)
});
console.timeEnd("Time taken for Array.includes()");

console.time("Time taken for Set.has()");
users.filter((user)=>{
    return  onlineUserSet.has(user.id)
});
console.timeEnd("Time taken for Set.has()")

Now run the following command in the terminal to run index.js

node index.js

Output :

Time taken for Array.includes(): 59.396ms
Time taken for Set.has(): 3.578ms

Now, for 100K users in DB and filter for 10000 users

Now, change the value from 1000 to 10000 in the for loop in index.js file.

const {users} = require("./data.json");

/* activeUsers variable contains an array of active user 
for whom we want to filter the data
*/ 
const activeUsers = new Array(10000);

for(let start=1;start<=10000;start++){
    activeUsers[start]=start;
}
// using Set Data Structure to store the activeUsers id in a Set
const onlineUserSet = new Set(activeUsers);

console.time("Time taken for Array.includes()");
users.filter((user)=>{
    return  activeUsers.includes(user.id)
});
console.timeEnd("Time taken for Array.includes()");

console.time("Time taken for Set.has()");
users.filter((user)=>{
    return  onlineUserSet.has(user.id)
});
console.timeEnd("Time taken for Set.has()")

Now run the following command in the terminal to run index.js

node index.js

Output :

Time taken for Array.includes(): 497.706ms
Time taken for Set.has(): 3.4ms

Now, for 1M users in DB and filter for 100K users

Now, change the value from 100000 to 1000000 in the for loop in generate_dummy_data.js file.

const fs = require("fs");
const users = [];
for(let start=1;start<=1000000;start++){
    let user = {"id":start, "email": `random${start}@gmail.com`};
    users.push(user);
}
fs.writeFileSync("./data.json",`{"users":${JSON.stringify(users)}}`);

Now run the following command in the terminal to run generate_dummy_data.js

node generate_dummy_data.js

After running the script, data.json file contains 1M dummy users data. Now, change the value from 10000 to 100000 in the for loop in index.js file.

const {users} = require("./data.json");

/* activeUsers variable contains an array of active user 
for whom we want to filter the data
*/ 
const activeUsers = new Array(100000);

for(let start=1;start<=100000;start++){
    activeUsers[start]=start;
}
// using Set Data Structure to store the activeUsers id in a Set
const onlineUserSet = new Set(activeUsers);

console.time("Time taken for Array.includes()");
users.filter((user)=>{
    return  activeUsers.includes(user.id)
});
console.timeEnd("Time taken for Array.includes()");

console.time("Time taken for Set.has()");
users.filter((user)=>{
    return  onlineUserSet.has(user.id)
});
console.timeEnd("Time taken for Set.has()")

Now run the following command in the terminal to run index.js

node index.js

Output :

Time taken for Array.includes(): 49.614s
Time taken for Set.has(): 41.704ms

Conclusion

At different number of data elements, at first the tests ran with less amount of time but when dealing with 1M data elements, the Set was the clear winner (array: 49.614s, set: 41.704ms). So when dealing with search operation with distinct elements Set is the most ideal data structure to use for enhancing and optimizing in time complexity.

As we can see the drastic difference in time with the examples above, this is how you can enhance and optimize your code which can scale from 0 to 1M data.

service category

Innovate faster, and go farther with serverless-native application development. Explore limitless possibilities with AntStack's serverless solutions. Empowering your business to achieve your most audacious goals.

Build with us

Author(s)

Your Digital Journey deserves a great story.

Build one with us.

Recommended Blogs

cookie

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors on this website.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference not to be tracked.

Build With Us