How to Setup MongoDB Text Search with Node.js

2016/03/199 min read
bookmark this
Responsive image

Table of Contents

  1. Introduction
  2. How to Use MongoDB Text Search
  3. Stop Words and Stemming in MongoDB Text Search
  4. Getting Started with MongoDB Text Search
  5. What's Going On, Doesn't Work!
  6. So, What Did I Do?
  7. How Does MongoDB Define Score for Text Search?
  8. Conclusion

Introduction

MongoDB provides a text search function since version 2.4. This blog will show how to use MongoDB text search. All the examples below are run based on MongoDB version 3.0.5. I'm also using Robomongo (0.8.5) to run all the MongoDB commands.

I also have a code snippet about how to use Mongoose with text search, but I recommend using at least version 2.6, since 2.4 with Mongoose has some problems.

How to Use MongoDB Text Search

Stop Words and Stemming in MongoDB Text Search

Before jumping into how to use MongoDB text search, we need to understand how MongoDB handles stop words and stemming. Stop words in MongoDB means that if you have "I like apple", the indexed word will be apple. Stemming is like "cook, cooking, cooks" — MongoDB's text engine will index "cook".

I like apple, also I enjoy cooking.

In the above example, MongoDB will index apple and cook. There is a very good blog about this topic: text-search-mongodb-stemming.

Getting Started with MongoDB Text Search

Get MongoDB's Version

This command will tell you what version of MongoDB you're using.

db.version()
// 3.0.5

Get Current MongoDB Database Name

This will return your current database name. Before you run your query, you need to check that you're connected to the correct MongoDB database.

db.getName()
// the current database name.

Check Index of Current Collection

You can run the following command if you want to check whether your collection has a text index or other indexes.

The following example shows how to check whether my collection Posts contains any indexes.

// You run this
db.posts.getIndexes()
// Then you will get some result like this, or if there's no index then just empty JSON. As you can see, the following example contains no text search index.
{
    "0" : {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "dake-tech.posts"
    },
    "1" : {
        "v" : 1,
        "unique" : true,
        "key" : {
            "slug" : 1
        },
        "name" : "slug_1",
        "ns" : "dake-tech.posts",
        "background" : true,
        "safe" : null
    }
}

Now, we're going to add a text index to this collection. First, the collection Posts looks like the following. I'd like to create a text index on content.brief, content.extended, and title.

{
    "_id" : ObjectId("535cc04cd06eabf033ee36bc"),
    "__v" : 0,
    "author" : ObjectId("535c4fe25115d24c0c4cf84b"),
    "categories" : [],
    "content" : {
        "brief" : "Looks like http://embed.ly/ is a great site if I want to create something with a URL. It'll get the short version of the URL's detail.",
        "extended" : ""
    },
    "publishedDate" : ISODate("2014-04-01T07:00:00.000Z"),
    "slug" : "embedly-get-url-link-detail",
    "state" : "published",
    "title" : "embed.ly - Get url link detail"
}

Add Text Index to Title

First, let's add a text index to the title:

db.posts.ensureIndex({title: "text"});

After I created the text index on the title field and ran db.posts.getIndexes() again to check my index, I got the following new entry. I had successfully added a text index to my Posts.Title.

"4" : {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "title_text",
        "ns" : "dake-tech.posts",
        "weights" : {
            "title" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 2
    }

Validate the Index

Now, MongoDB also has YourCollection.validate() which will validate your index. When I ran db.posts.validate(), I got the following result about my title field's text index. The "dake-tech.posts.$title_text" : 1464 tells you the post.title indexed 1464 key words.

{
    "ns" : "dake-tech.posts",
    "datasize" : 4.72926e+006,
    "nrecords" : 325,
    "lastExtentSize" : 1.67772e+007,
    "firstExtent" : "0:ef000 ns:dake-tech.posts",
    "lastExtent" : "0:351000 ns:dake-tech.posts",
    "extentCount" : 4,
    "firstExtentDetails" : {
        "loc" : "0:ef000",
        "xnext" : "0:111000",
        "xprev" : "null",
        "nsdiag" : "dake-tech.posts",
        "size" : 8192,
        "firstRecord" : "0:ef0b0",
        "lastRecord" : "0:f0eb0"
    },
    "lastExtentDetails" : {
        "loc" : "0:351000",
        "xnext" : "null",
        "xprev" : "0:151000",
        "nsdiag" : "dake-tech.posts",
        "size" : 1.67772e+007,
        "firstRecord" : "0:3510b0",
        "lastRecord" : "0:591230"
    },
    "deletedCount" : 5,
    "deletedSize" : 1.44095e+007,
    "nIndexes" : 5,
    "keysPerIndex" : {
        "dake-tech.posts.$_id_" : 325,
        "dake-tech.posts.$slug_1" : 325,
        "dake-tech.posts.$state_1" : 325,
        "dake-tech.posts.$publishedDate_1" : 325,
        "dake-tech.posts.$title_text" : 1464
    },
    "valid" : true,
    "errors" : [],
    "warning" : "Some checks omitted for speed. use {full:true} option to do more thorough scan.",
    "ok" : 1.0000000000000000
}

Running a Text Search

When you type db.posts.runCommand("text", {search: "mongodb"}), you might get a result like the following:

{
    "ok" : 0.0000000000000000,
    "errmsg" : "no such command: text",
    "code" : 59,
    "bad cmd" : {
        "text" : "posts",
        "search" : "mongodb"
    }
}

What's Going On, Doesn't Work!

You might try the following — go to admin and enable text search. However, you will still get the same result. The following command seems to work for older MongoDB versions, maybe 2.4 to 2.6. However, if you use the latest MongoDB (3.0 or above), text search is enabled by default, and runCommand("text", no longer exists.

use admin
db.runCommand({setParameter: 1, textSearchEnabled: true})

db.posts.runCommand("text", {search: "mongodb"})

{
    "was" : true,
    "ok" : 1.0000000000000000
}

So, What Did I Do?

If your MongoDB version is 3.0 or above (like my current MongoDB 3.0.5), you can try the following:

db.posts.find( { $text: { $search: "javascript" } } )

Also, here is the result I got. Awesome, right? You get 12 results.

Adding Text Index to Multiple Fields

Now, I'm going to add a text index to additional fields: content.brief and content.extended. So you type db.posts.ensureIndex({'content.brief': 'text'});. Then you get this error — what's going on? Can I only add one text index?

{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 3,
    "errmsg" : "exception: Index with pattern: { _fts: \"text\", _ftsx: 1 } already exists with different options",
    "code" : 85,
    "ok" : 0
}

The answer is that in the current version (3.0.5), you can only have one text index per collection. If you want to add multiple fields for text search, you need to do the following. First, run db.posts.dropIndex('title_text') to drop the old text index.

This is the result you will get:

{
    "nIndexesWas" : 5,
    "ok" : 1.0000000000000000
}

Now, you can run the following command to add a text index to multiple fields:

db.posts.ensureIndex({title: 'text', 'content.brief': 'text', 'content.extended': 'text'});

Once it shows a result like the following, the new index is added.

If you run db.posts.getIndexes(), you can verify that your text index has been added:

"4" : {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "title_text_content.brief_text_content.extended_text",
        "ns" : "dake-tech.posts",
        "weights" : {
            "content.brief" : 1,
            "content.extended" : 1,
            "title" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 2
    }

Also, if you run the index validation db.posts.validate(), you can see that the new text index has 31429 key words indexed ("dake-tech.posts.$title_text_content.brief_text_content.extended_text" : 31429), compared to the previous title-only index which had 1464 keys indexed.

{
    "ns" : "dake-tech.posts",
    "datasize" : 4.72926e+006,
    "nrecords" : 325,
    "lastExtentSize" : 1.67772e+007,
    "firstExtent" : "0:ef000 ns:dake-tech.posts",
    "lastExtent" : "0:351000 ns:dake-tech.posts",
    "extentCount" : 4,
    "firstExtentDetails" : {
        "loc" : "0:ef000",
        "xnext" : "0:111000",
        "xprev" : "null",
        "nsdiag" : "dake-tech.posts",
        "size" : 8192,
        "firstRecord" : "0:ef0b0",
        "lastRecord" : "0:f0eb0"
    },
    "lastExtentDetails" : {
        "loc" : "0:351000",
        "xnext" : "null",
        "xprev" : "0:151000",
        "nsdiag" : "dake-tech.posts",
        "size" : 1.67772e+007,
        "firstRecord" : "0:3510b0",
        "lastRecord" : "0:591230"
    },
    "deletedCount" : 5,
    "deletedSize" : 1.44095e+007,
    "nIndexes" : 5,
    "keysPerIndex" : {
        "dake-tech.posts.$_id_" : 325,
        "dake-tech.posts.$slug_1" : 325,
        "dake-tech.posts.$state_1" : 325,
        "dake-tech.posts.$publishedDate_1" : 325,
        "dake-tech.posts.$title_text_content.brief_text_content.extended_text" : 31429
    },
    "valid" : true,
    "errors" : [],
    "warning" : "Some checks omitted for speed. use {full:true} option to do more thorough scan.",
    "ok" : 1.0000000000000000
}

Now let's run the same text search query again with the score:

db.posts.find( { $text: { $search: "javascript" } }, { score: { $meta: "textScore" }} )

This is not what I expected — the first result's title is not related to JavaScript.

How Does MongoDB Define Score for Text Search?

I don't have an answer for this yet; this is something I want to figure out. In the meantime, once your text search is enabled, you can try the following queries. You can also find documentation from MongoDB here.

  1. Search for a word: db.posts.find( { $text: { $search: "javascript" } } )

  2. Search for a phrase: db.posts.find( { $text: { $search: "javascript angular react" } } )

  3. Exclude documents that contain a term (search for "javascript" and exclude "angular"): db.posts.find( { $text: { $search: "javascript -angular" } } )

  4. Search in a different language: db.posts.find( { $text: { $search: "javascript", $language: "es" } } )

  5. Case-sensitive search: db.posts.find( { $text: { $search: "Javascript", $caseSensitive: true } } )

  6. Case-sensitive phrase search: db.posts.find( { $text: { $search: "\"Javascript React\"", $caseSensitive: true } } )

  7. Case-sensitive with excluded term (search "Javascript" but exclude lowercase "javascript"): db.posts.find( { $text: { $search: "Javascript -javascript", $caseSensitive: true } } )

  8. Diacritic-sensitive search for a term

  9. Diacritic sensitivity with negated term

  10. Return the text search score: db.posts.find( { $text: { $search: "javascript" } }, { score: { $meta: "textScore" }} )

  11. Sort by text search score

  12. Return top 2 matching documents

  13. Text search with additional query and sort expressions

Conclusion

MongoDB text search is a powerful feature that allows you to perform full-text queries on string content. Key takeaways include: you can only have one text index per collection (but it can span multiple fields), text search uses stop words and stemming to index content, and starting from MongoDB 3.0, text search is enabled by default using the $text operator instead of the legacy runCommand("text", ...) syntax.