How to optimize MongoDB Query - Linear and Exponential Change

To evaluate the performance and efficiency of these functions for finding or creating tags, we will look at the number of database queries each function performs and their overall approach. Here is the analysis from worst to best:

How to optimize mongodb query
How to optimize mongodb query
Visit 'JobApi Application' with Express and TypeScript and download the code from GitHub

Find Or Create Tags 1

This function concurrently processes each tag by first attempting to find it in the database; if the tag does not exist, it creates a new one, ultimately returning the IDs of all found or newly created tags.


static async findOrCreateTags1(tags: string[]) {

  // Using Promise.all to handle multiple asynchronous operations concurrently
  return Promise.all(tags.map(async (tagName: string) => {

    // Attempt to find an existing tag in the database
    let tag = await tagsSchema.findOne({ tag: tagName })

    // If the tag does not exist, create a new one
    if(!tag) {
      tag = await tagsSchema.create({ tag: tagName })
    }

    // Return the _id of the found or newly created tag
    return tag?._id

  }))
}
  • Queries per tag: Up to 2 (one findOne, one create if not found).
  • Number of queries: Up to 2 * tags.length
  • Efficiency: This function is the least efficient because it performs a separate findOne and create for each tag, leading to a high number of database queries.

Find Or Create Tags 2

This function takes an array of tag names, processes each tag concurrently to either find it or create it if it doesn't exist using findOneAndUpdate with the upsert option, and returns an array of the IDs of the found or created tags.


static async findOrCreateTags2(tags: string[]) {

  // Use Promise.all to process all tags concurrently
  return Promise.all(tags.map(async (tagName: string) => {

    // Use findOneAndUpdate with upsert to either find an existing tag or create a new one if it doesn't exist
    const tag = await tagsSchema.findOneAndUpdate(
      { tag: tagName }, // Query to find a tag by name
      { $setOnInsert: { tag: tagName } }, // Set this value if creating a new tag
      { new: true, upsert: true } // Return the new document and create if not found
    )

    // Return the ID of the found or created tag
    return tag._id
    
  }))
}
  • Queries per tag: 1 (using findOneAndUpdate with upsert).
  • Number of queries: tags.length
  • Efficiency: This is more efficient than findOrCreateTags1 since it uses findOneAndUpdate with the upsert option, ensuring a single query per tag.

Find Or Create Tags 3

The function findOrCreateTags3 finds existing tags in the database, identifies which input tags are new, inserts the new tags in a batch if any, and returns the IDs of the newly inserted tags.


static async findOrCreateTags3(tags: string[]): Promise<ObjectId[]> {

  // Step 1: Find all existing tags in the database that match any of the input tags
  const existingTags: ITag[] = await tagsSchema.find({ tag: { $in: tags } });

  // Step 2: Extract the tag names of the found existing tags into a Set for quick lookup
  const existingTagNames = new Set(existingTags.map(tag => tag.tag));

  // Step 3: Filter the input tags to get the tags that do not exist in the database
  const newTagNames = tags.filter(tagName => !existingTagNames.has(tagName));

  let insertedTags: ITag[] = [];

  // Step 4: If there are new tags, insert them into the database in a single batch operation
  if (newTagNames.length > 0) {
    insertedTags = await tagsSchema.insertMany(newTagNames.map(tagName => ({ tag: tagName })));
  }

  // Step 5: Return the IDs of the newly inserted tags
  return insertedTags.map(tag => tag._id);

}
  • Queries: 1 to find existing tags, 1 to insert new tags (if any).
  • Number of queries: 2 in the best case (or 1 if no new tags).
  • Efficiency: This function is efficient as it batches the find and insertMany operations, reducing the total number of queries significantly.

Find Or Create Tags 4

This function efficiently finds or creates tags by first querying for all existing tags in a single operation, identifying which tags need to be created, inserting any new tags in a batch, and then returning the IDs of the newly inserted tags.


static async findOrCreateTags4(tags: string[]): Promise<ObjectId[]> {
  // Step 1: Find all existing tags in one query
  const existingTags = await tagsSchema.find({ tag: { $in: tags } }).select('_id tag');

  // Step 2: Map existing tags to a Set for faster lookup
  const existingTagNames = new Set(existingTags.map(tag => tag.tag));

  // Step 3: Filter out new tags that need to be inserted
  const newTagNames = tags.filter(tagName => !existingTagNames.has(tagName));

  // Step 4: Insert new tags in a batch if there are any
  let insertedTags: ITag[] = [];
  if (newTagNames.length > 0) {
    insertedTags = await tagsSchema.insertMany(newTagNames.map(tagName => ({ tag: tagName })));
  }

  // Step 5: Return the IDs of the newly inserted tags
  return insertedTags.map(tag => tag._id);
}
  • Queries: 1 to find existing tags, 1 to insert new tags (if any).
  • Number of queries: 2 in the best case (or 1 if no new tags).
  • Efficiency: This function is similar in efficiency to findOrCreateTags3, with clear steps and logic. It's very efficient, minimizing the number of database queries by batching operations.

Find Or Create Tags 5

The function first finds existing tags in the database that match the provided array of tag names, then filters out those that are not already present, inserts the new tags in a batch operation, and finally returns the IDs of the newly inserted tags.


static async findOrCreateTags5(tags: string[]): Promise<ObjectId[]> {

  // Step 1: Find all existing tags that match the provided tags array
  const existingTags: ITag[] = await tagsSchema.find({ tag: { $in: tags } });

  // Step 2: Create a Set of existing tag names for quick lookup
  const existingTagNames = new Set(existingTags.map(tag => tag.tag));

  // Step 3: Filter out the tags that do not exist in the database
  const newTagNames = tags.filter(tagName => !existingTagNames.has(tagName));

  let insertedTags: ITag[] = [];

  // Step 4: If there are new tags, insert them into the database
  if (newTagNames.length > 0) {
    insertedTags = await tagsSchema.insertMany(newTagNames.map(tagName => ({ tag: tagName })));
  }

  // Step 5: Return the IDs of the newly inserted tags
  return insertedTags.map(tag => tag._id);
  
}
  • Queries: 1 to find existing tags, 1 to insert new tags (if any).
  • Number of queries: 2 in the best case (or 1 if no new tags).
  • Efficiency: This function is essentially the same as findOrCreateTags3 and findOrCreateTags4, providing the same level of efficiency.

Find Or Create Tags - Final Version

The function findOrCreateTags first retrieves all existing tags from the database in a single query, then determines which tags are new and need to be inserted, inserts these new tags in a batch operation, and finally returns the IDs of the newly inserted tags.


static async findOrCreateTags(tags: string[]): Promise<ObjectId[]> {

  // Step 1: Find all existing tags in one query
  const existingTags: ITag[] = await tagsSchema.find({ tag: { $in: tags } });

  // Step 2: Determine which tags need to be inserted
  const existingTagNames = existingTags.map((tag: ITag) => tag.tag);
  const newTagNames = tags.filter(tagName => !existingTagNames.includes(tagName));

  let insertedTags: ITag[] = [];

  // Step 3: Insert the missing tags in a single batch operation
  if (newTagNames.length > 0) {
    insertedTags = await tagsSchema.insertMany(newTagNames.map(tagName => ({ tag: tagName })));
  }

  // Return the IDs of the newly inserted tags
  return insertedTags.map((tag: ITag) => tag._id);
  
}
  • Queries: 1 to find existing tags, 1 to insert new tags (if any).
  • Number of queries: 2 in the best case (or 1 if no new tags).
  • Efficiency: This function is also very efficient, similar to findOrCreateTags3, findOrCreateTags4, and findOrCreateTags5.

Ranking from Worst to Best

  • findOrCreateTags1
    Reason: Performs the highest number of queries, up to 2 per tag.
  • findOrCreateTags2
    Reason: Reduces the number of queries to 1 per tag but still performs one query for each tag individually.
  • findOrCreateTags5
    Reason: Efficient with a single query to find and a single batch insert, similar to the best solutions.
  • findOrCreateTags3
    Reason: Efficient with a single query to find and a single batch insert.
  • findOrCreateTags4
    Reason: Efficient with clear steps, similar to findOrCreateTags3.
  • findOrCreateTags (final version)
    Reason: Similar efficiency to findOrCreateTags3 and findOrCreateTags4, but clearer in implementation.

Linear & Exponential Change

Linear change refers to a situation where the number of operations increases directly in proportion to the size of the input. In other words, if you double the number of tags, you double the number of operations. Examples of Linear Change in the Provided Functions:

findOrCreateTags1:
  • Number of Queries: Up to 2 * tags.length
  • Explanation: This function potentially performs two queries per tag (one findOne and one create), leading to a linear increase in the number of queries as the number of tags increases.
findOrCreateTags2:
  • Number of Queries: tags.length
  • Explanation: This function performs one query per tag using findOneAndUpdate, resulting in a linear increase in the number of queries with the number of tags.
Exponantional Change - Constant or Batch-Oriented Change

This principle is more efficient because it batch operations, minimizing the number of database interactions regardless of the input size, as much as possible.

Examples of Efficient Change in the Provided Functions:
  • findOrCreateTags3, findOrCreateTags4, findOrCreateTags5, and findOrCreateTags (final version)
  • Number of Queries: 2 (or 1 if no new tags)
  • Explanation: These functions perform a single query to find all existing tags and another single query to insert all new tags in a batch. The number of operations does not linearly increase with the number of tags; instead, it remains constant (2 queries) as long as there are new tags to insert.

In the batch-oriented change functions, while we reduce the number of database queries to a constant number, there are still multiple cycles (loops) involved in processing the data. Let's analyze the expense of these cycles and operations:

  • Mapping existing tags: existingTags.map(tag => tag.tag)
  • Filtering new tags: tags.filter(tagName => !existingTagNames.has(tagName))
  • Mapping new tags for insertion: newTagNames.map(tagName => (tag: tagName))
  • Mapping inserted tags to get IDs: insertedTags.map(tag => tag._id)
Database Queries vs. CPU Cycles:
  • Database Queries: Generally, database queries are more expensive than in-memory operations due to the overhead of network latency, disk I/O, and query processing.
  • CPU Cycles (Loops): In-memory operations like loops and filtering are typically faster because they are performed in the application's memory space.
  • Batch-Oriented Functions: Although these functions involve multiple in-memory cycles, the cost is usually lower compared to the cost of multiple database queries. The in-memory operations (mapping and filtering) are relatively fast and efficient, especially when compared to the latency and resource usage of database interactions.
Final Thoughts

Even though the batch-oriented functions involve several cycles, their design reduces the overall expense by minimizing the number of expensive database operations. The trade-off between CPU cycles and database queries strongly favors batch processing, making these functions more efficient and cost-effective despite the presence of multiple in-memory operations.

Summary & Conclusion

  • Best performance and efficiency: findOrCreateTags (final version), findOrCreateTags3, findOrCreateTags4, findOrCreateTags5
    These functions perform a minimal number of queries by batching the operations.
  • Moderate performance: findOrCreateTags2
    It performs one query per tag, which is better than the worst case but not as good as the best case.
  • Worst performance: findOrCreateTags1
    Reason: Efficient with a single query to find and a single batch insert, similar to the best solutions.
Visit 'JobApi Application' with Express and TypeScript and download the code from GitHub

Click here to read about different stratigies for search items based on tags

Click here to read more about Essential Security Practices for a Secure Node.js Application