How to Setup MongoDB Text Search with Node.js

2016/3/199 min read
bookmark this
Responsive image

How to use MongoDB text search

MongoDB provide text search function since 2.4, this blog will show how to use mongodb text search.

All the example at following is run based on MongoDB version, 3.0.5. Also, I'm using Robomongo (0.8.5) to run all the MongoDB command.

I also have code snippet about how to use Mongoose with text search, but I recommend at least use 2.6, seems like 2.4 with Mongoose has some problem.

Stop Words, Stemming in MongoDB text search

Before jump into how to use MongoDB text search, we need to understand how MongoDB do stop words and stemming. Stop in MongoDB is, let's say you have "I like apple", stop wold will be apple. Stemming is like "cook cooking cooks", so MongoDB text engine will index cook.

I like apple, also I enjoy cooking.

Above example, MongoDB will index apple and cook, There is very good blog about this topic, text-search-mongodb-stemming.

 

Now, let's get started how to use MongoDB text search

Get MongoDB's version

This command will tell you what version of MongoDB you're using.

db.version()
// 3.0.5
Get Current MongoDB database name

This will return your current db name. Before you run you query you need to check you're running to the correct mongodb or not right?

db.getName()
// the current database name.
Check Index of current collection

You can run following command, if you want to check your collection had add text index or other index or not.

Following example is how I check my collection Posts contains any index or not.

// You run this
db.posts.getIndexes()
// Then will some result like this or if there's no index then just empty json. Now as you can see following example contains no text search index.
{
    "0" : {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "name" : "_id_",
        "ns" : "dake-tech.posts"
    },
    "1" : {
        "v" : 1,
        "unique" : true,
        "key" : {
            "slug" : 1
        },
        "name" : "slug_1",
        "ns" : "dake-tech.posts",
        "background" : true,
        "safe" : null
    }
}

Now, we're going to add text index to this collection.First the collection Posts looks like following. I'd like to add create text index to content.brief & content.extended & title


{
    "_id" : ObjectId("535cc04cd06eabf033ee36bc"),
    "__v" : 0,
    "author" : ObjectId("535c4fe25115d24c0c4cf84b"),
    "categories" : [],
    "content" : {
        "brief" : "</code></pre>
<p>Looks like&nbsp;http://embed.ly/ is greate site, if I want to create something with URL. It'll get the short version of the url's detail. I have thought about imprement these kind of stuff, but since somebody already create this, why not just use it.&nbsp;</p>
<pre class="language-javascript"><code>\r\n</code></pre>
<p>&nbsp;</p>
<pre class="language-javascript"><code>\r\n</code></pre>
<p>Nextime, i'll post some detail when I use it.</p>
<pre class="language-javascript"><code>",
        "extended" : ""
    },
    "publishedDate" : ISODate("2014-04-01T07:00:00.000Z"),
    "slug" : "embedly-get-url-link-detail",
    "state" : "published",
    "title" : "embed.ly - Get url link detail"
}

First, let's add text index to title

db.posts.ensureIndex({title: "text"});

After I create text index to the titlefield, and run db.posts.getIndexes() again to check my index, I got following new added index. Cool, I had added text index to my Posts.Title.

    "4" : {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "title_text",
        "ns" : "dake-tech.posts",
        "weights" : {
            "title" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 2
    }

Now, mongodb also has YouCollection.validate() which will validate your index.So when I run db.posts.validate() I got following result about my title field's text index. So cool right, the "dake-tech.posts.$title_text" : 1464 is telling you the post.title indexed 1464keys words.


{
    "ns" : "dake-tech.posts",
    "datasize" : 4.72926e+006,
    "nrecords" : 325,
    "lastExtentSize" : 1.67772e+007,
    "firstExtent" : "0:ef000 ns:dake-tech.posts",
    "lastExtent" : "0:351000 ns:dake-tech.posts",
    "extentCount" : 4,
    "firstExtentDetails" : {
        "loc" : "0:ef000",
        "xnext" : "0:111000",
        "xprev" : "null",
        "nsdiag" : "dake-tech.posts",
        "size" : 8192,
        "firstRecord" : "0:ef0b0",
        "lastRecord" : "0:f0eb0"
    },
    "lastExtentDetails" : {
        "loc" : "0:351000",
        "xnext" : "null",
        "xprev" : "0:151000",
        "nsdiag" : "dake-tech.posts",
        "size" : 1.67772e+007,
        "firstRecord" : "0:3510b0",
        "lastRecord" : "0:591230"
    },
    "deletedCount" : 5,
    "deletedSize" : 1.44095e+007,
    "nIndexes" : 5,
    "keysPerIndex" : {
        "dake-tech.posts.$_id_" : 325,
        "dake-tech.posts.$slug_1" : 325,
        "dake-tech.posts.$state_1" : 325,
        "dake-tech.posts.$publishedDate_1" : 325,
        "dake-tech.posts.$title_text" : 1464
    },
    "valid" : true,
    "errors" : [],
    "warning" : "Some checks omitted for speed. use {full:true} option to do more thorough scan.",
    "ok" : 1.0000000000000000
}

Now, let's do text search

When you type db.posts.runCommand("text", {search: "mongodb"}), you might get result as following.


{
    "ok" : 0.0000000000000000,
    "errmsg" : "no such command: text",
    "code" : 59,
    "bad cmd" : {
        "text" : "posts",
        "search" : "mongodb"
    }
}

What's going on, doesn't work!

You might able to try following, go to admin enable the text search. However you will still get the same result. What happening is following command looks like works for the old MongoDB, maybe 2.4 to 2.6. However, if you use latest MongoDB (3.0 or above), text search is enable by default, and runCommand("text", is no longer exist.

use admin
db.runCommand({setParameter: 1, textSearchEnabled: true})

db.posts.runCommand("text", {search: "mongodb"})

{
    "was" : true,
    "ok" : 1.0000000000000000
}

So, what's I do?

If you mongodb version is 3.0 or above like my current MongoDB, 3.0.5. You can try following.


db.posts.find( { $text: { $search: "javascript" } } )

Also, following is the result I got. Awesome, right!? you got 12 results.

Now, I'm going to add text index to my result of fields, content.brief and content.extended, So you type... db.posts.ensureIndex({'content.brief': 'text'});. Then you got this error, what's going on!!? I can only add once!?


{
    "createdCollectionAutomatically" : false,
    "numIndexesBefore" : 3,
    "errmsg" : "exception: Index with pattern: { _fts: \"text\", _ftsx: 1 } already exists with different options",
    "code" : 85,
    "ok" : 0
}

The result is, at current version 3.0.5, you can only add one text index, if you want to add multi fields for text search, you could do as following. First, you run db.posts.dropIndex('title_text'), so you can drop the old text index.

This is the result you will get.


{
    "nIndexesWas" : 5,
    "ok" : 1.0000000000000000
}

Add text index to multi fields

Now, you can run following command.

db.posts.ensureIndex({title: 'text', 'content.brief': 'text', 'content.extended': 'text'});

Once shows result as following, the new index is added.

Inserted 1 record(s) in 27ms

If you run db.posts.getIndexes(), you can verify you text index is added.


    "4" : {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "title_text_content.brief_text_content.extended_text",
        "ns" : "dake-tech.posts",
        "weights" : {
            "content.brief" : 1,
            "content.extended" : 1,
            "title" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 2
    }

Also, if you the index validation db.posts.validate(), you can see that new text index has index 31429key words. "dake-tech.posts.$title_text_content.brief_text_content.extended_text" : 31429Compare before just title only 1464 was indexed. Now, I'd like to run the same text search query again, it's the fun time!!


{
    "ns" : "dake-tech.posts",
    "datasize" : 4.72926e+006,
    "nrecords" : 325,
    "lastExtentSize" : 1.67772e+007,
    "firstExtent" : "0:ef000 ns:dake-tech.posts",
    "lastExtent" : "0:351000 ns:dake-tech.posts",
    "extentCount" : 4,
    "firstExtentDetails" : {
        "loc" : "0:ef000",
        "xnext" : "0:111000",
        "xprev" : "null",
        "nsdiag" : "dake-tech.posts",
        "size" : 8192,
        "firstRecord" : "0:ef0b0",
        "lastRecord" : "0:f0eb0"
    },
    "lastExtentDetails" : {
        "loc" : "0:351000",
        "xnext" : "null",
        "xprev" : "0:151000",
        "nsdiag" : "dake-tech.posts",
        "size" : 1.67772e+007,
        "firstRecord" : "0:3510b0",
        "lastRecord" : "0:591230"
    },
    "deletedCount" : 5,
    "deletedSize" : 1.44095e+007,
    "nIndexes" : 5,
    "keysPerIndex" : {
        "dake-tech.posts.$_id_" : 325,
        "dake-tech.posts.$slug_1" : 325,
        "dake-tech.posts.$state_1" : 325,
        "dake-tech.posts.$publishedDate_1" : 325,
        "dake-tech.posts.$title_text_content.brief_text_content.extended_text" : 31429
    },
    "valid" : true,
    "errors" : [],
    "warning" : "Some checks omitted for speed. use {full:true} option to do more thorough scan.",
    "ok" : 1.0000000000000000
}

You run db.posts.find( { $text: { $search: "javascript" } }, { score: { $meta: "textScore" }} )

Mmmm, this is not what I expected, the first result's title is not relative with Javascript.

How does Mongodb define score for text search?

I don't have an answer for this yet, this is something I want to figure out. In the mean time, once your text search is enable, you can try something as following. You can also find document from MongoDB at Here.

  1. search word db.post.find( { $text: { $search: "javascript" } } )
  2. search phrase db.post.find( { $text: { $search: "javascript angular react" } } )
  3. Exclude Documents that contains term, following will search javascript and exclude angular from the document db.post.find( { $text: { $search: "javascript -angular" } } )
  4. You can search for different language db.post.find( { $text: { $search: "javascript", $language: "es" } } )
  5. You can search Case Sensitive db.post.find( { $text: { $search: "Javascript", $caseSensitive: true } } )
  6. Case Sensitive for Phrase db.post.find( { $text: { $search: "\"Javascript React\"", $caseSensitive: true } } )
  7. Case Sensitive with exclude term, search Javascript but exclude lower-case javascript db.post.find( { $text: { $search: "Javascript -javascript", $caseSensitive: true } } )
  8. Diacritic Sensitive Search for a Term
  9. Diacritic Sensitivity with Negated Term
  10. Return the Text Search Score db.posts.find( { $text: { $search: "javascript" } }, { score: { $meta: "textScore" }} )
  11. Sort by Text Search Score
  12. Return Top 2 Matching Documents
  13. Text Search with Additional Query and Sort Expressions