How to Setup MongoDB Text Search with Node.js
How to use MongoDB text search
MongoDB provide text search function since 2.4, this blog will show how to use mongodb text search.
All the example at following is run based on MongoDB version, 3.0.5. Also, I'm using Robomongo (0.8.5) to run all the MongoDB command.
I also have code snippet about how to use Mongoose with text search, but I recommend at least use 2.6, seems like 2.4 with Mongoose has some problem.
Stop Words, Stemming in MongoDB text search
Before jump into how to use MongoDB text search, we need to understand how MongoDB do stop words and stemming. Stop in MongoDB is, let's say you have "I like apple", stop wold will be apple. Stemming is like "cook cooking cooks", so MongoDB text engine will index cook.
I like apple, also I enjoy cooking.
Above example, MongoDB will index apple
and cook
,
There is very good blog about this topic, text-search-mongodb-stemming.
Now, let's get started how to use MongoDB text search
Get MongoDB's version
This command will tell you what version of MongoDB you're using.
db.version() // 3.0.5
Get Current MongoDB database name
This will return your current db name. Before you run you query you need to check you're running to the correct mongodb or not right?
db.getName() // the current database name.
Check Index of current collection
You can run following command, if you want to check your collection had add text index or other index or not.
Following example is how I check my collection
Posts
contains any index or not.
// You run this db.posts.getIndexes()
// Then will some result like this or if there's no index then just empty json. Now as you can see following example contains no text search index.
{
"0" : {
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "dake-tech.posts"
},
"1" : {
"v" : 1,
"unique" : true,
"key" : {
"slug" : 1
},
"name" : "slug_1",
"ns" : "dake-tech.posts",
"background" : true,
"safe" : null
}
}
Now, we're going to add text index to this collection.First the collection
Posts
looks like following. I'd like to add create text index to
content.brief & content.extended & title
{
"_id" : ObjectId("535cc04cd06eabf033ee36bc"),
"__v" : 0,
"author" : ObjectId("535c4fe25115d24c0c4cf84b"),
"categories" : [],
"content" : {
"brief" : "</code></pre>
<p>Looks like http://embed.ly/ is greate site, if I want to create something with URL. It'll get the short version of the url's detail. I have thought about imprement these kind of stuff, but since somebody already create this, why not just use it. </p>
<pre class="language-javascript"><code>\r\n</code></pre>
<p> </p>
<pre class="language-javascript"><code>\r\n</code></pre>
<p>Nextime, i'll post some detail when I use it.</p>
<pre class="language-javascript"><code>",
"extended" : ""
},
"publishedDate" : ISODate("2014-04-01T07:00:00.000Z"),
"slug" : "embedly-get-url-link-detail",
"state" : "published",
"title" : "embed.ly - Get url link detail"
}
First, let's add text index to title
db.posts.ensureIndex({title: "text"});
After I create text index to the title
field, and run
db.posts.getIndexes()
again to check my index, I got following
new added index. Cool, I had added text index to my Posts.Title.
"4" : { "v" : 1, "key" : { "_fts" : "text", "_ftsx" : 1 }, "name" : "title_text", "ns" : "dake-tech.posts", "weights" : { "title" : 1 }, "default_language" : "english", "language_override" : "language", "textIndexVersion" : 2 }
Now, mongodb also has YouCollection.validate() which will validate your
index.So when I run db.posts.validate()
I got following result
about my title field's text index. So cool right, the
"dake-tech.posts.$title_text" : 1464
is telling you the
post.title indexed 1464
keys words.
{
"ns" : "dake-tech.posts",
"datasize" : 4.72926e+006,
"nrecords" : 325,
"lastExtentSize" : 1.67772e+007,
"firstExtent" : "0:ef000 ns:dake-tech.posts",
"lastExtent" : "0:351000 ns:dake-tech.posts",
"extentCount" : 4,
"firstExtentDetails" : {
"loc" : "0:ef000",
"xnext" : "0:111000",
"xprev" : "null",
"nsdiag" : "dake-tech.posts",
"size" : 8192,
"firstRecord" : "0:ef0b0",
"lastRecord" : "0:f0eb0"
},
"lastExtentDetails" : {
"loc" : "0:351000",
"xnext" : "null",
"xprev" : "0:151000",
"nsdiag" : "dake-tech.posts",
"size" : 1.67772e+007,
"firstRecord" : "0:3510b0",
"lastRecord" : "0:591230"
},
"deletedCount" : 5,
"deletedSize" : 1.44095e+007,
"nIndexes" : 5,
"keysPerIndex" : {
"dake-tech.posts.$_id_" : 325,
"dake-tech.posts.$slug_1" : 325,
"dake-tech.posts.$state_1" : 325,
"dake-tech.posts.$publishedDate_1" : 325,
"dake-tech.posts.$title_text" : 1464
},
"valid" : true,
"errors" : [],
"warning" : "Some checks omitted for speed. use {full:true} option to do more thorough scan.",
"ok" : 1.0000000000000000
}
Now, let's do text search
When you type db.posts.runCommand("text", {search: "mongodb"})
,
you might get result as following.
{
"ok" : 0.0000000000000000,
"errmsg" : "no such command: text",
"code" : 59,
"bad cmd" : {
"text" : "posts",
"search" : "mongodb"
}
}
What's going on, doesn't work!
You might able to try following, go to admin enable the text search. However
you will still get the same result. What happening is following command looks
like works for the old MongoDB, maybe 2.4 to 2.6. However, if you use latest
MongoDB (3.0 or above), text search is enable by default, and
runCommand("text",
is no longer exist.
use admin db.runCommand({setParameter: 1, textSearchEnabled: true}) db.posts.runCommand("text", {search: "mongodb"}) { "was" : true, "ok" : 1.0000000000000000 }
So, what's I do?
If you mongodb version is 3.0 or above like my current MongoDB, 3.0.5. You can try following.
db.posts.find( { $text: { $search: "javascript" } } )
Also, following is the result I got. Awesome, right!? you got 12 results.
Now, I'm going to add text index to my result of fields,
content.brief
and content.extended
, So you type...
db.posts.ensureIndex({'content.brief': 'text'});
. Then you got
this error, what's going on!!? I can only add once!?
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"errmsg" : "exception: Index with pattern: { _fts: \"text\", _ftsx: 1 } already exists with different options",
"code" : 85,
"ok" : 0
}
The result is, at current version 3.0.5, you can only add one text index, if
you want to add multi fields for text search, you could do as following.
First, you run db.posts.dropIndex('title_text')
, so you can drop
the old text index.
This is the result you will get.
{
"nIndexesWas" : 5,
"ok" : 1.0000000000000000
}
Add text index to multi fields
Now, you can run following command.
db.posts.ensureIndex({title: 'text', 'content.brief': 'text',
'content.extended': 'text'});
Once shows result as following, the new index is added.
Inserted 1 record(s) in 27ms
If you run db.posts.getIndexes()
, you can verify you text index
is added.
"4" : {
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"name" : "title_text_content.brief_text_content.extended_text",
"ns" : "dake-tech.posts",
"weights" : {
"content.brief" : 1,
"content.extended" : 1,
"title" : 1
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 2
}
Also, if you the index validation db.posts.validate()
, you can
see that new text index has index 31429
key words.
"dake-tech.posts.$title_text_content.brief_text_content.extended_text" :
31429
Compare before just title only 1464
was indexed. Now, I'd like
to run the same text search query again, it's the fun time!!
{
"ns" : "dake-tech.posts",
"datasize" : 4.72926e+006,
"nrecords" : 325,
"lastExtentSize" : 1.67772e+007,
"firstExtent" : "0:ef000 ns:dake-tech.posts",
"lastExtent" : "0:351000 ns:dake-tech.posts",
"extentCount" : 4,
"firstExtentDetails" : {
"loc" : "0:ef000",
"xnext" : "0:111000",
"xprev" : "null",
"nsdiag" : "dake-tech.posts",
"size" : 8192,
"firstRecord" : "0:ef0b0",
"lastRecord" : "0:f0eb0"
},
"lastExtentDetails" : {
"loc" : "0:351000",
"xnext" : "null",
"xprev" : "0:151000",
"nsdiag" : "dake-tech.posts",
"size" : 1.67772e+007,
"firstRecord" : "0:3510b0",
"lastRecord" : "0:591230"
},
"deletedCount" : 5,
"deletedSize" : 1.44095e+007,
"nIndexes" : 5,
"keysPerIndex" : {
"dake-tech.posts.$_id_" : 325,
"dake-tech.posts.$slug_1" : 325,
"dake-tech.posts.$state_1" : 325,
"dake-tech.posts.$publishedDate_1" : 325,
"dake-tech.posts.$title_text_content.brief_text_content.extended_text" : 31429
},
"valid" : true,
"errors" : [],
"warning" : "Some checks omitted for speed. use {full:true} option to do more thorough scan.",
"ok" : 1.0000000000000000
}
You run
db.posts.find( { $text: { $search: "javascript" } }, { score: { $meta:
"textScore" }} )
Mmmm, this is not what I expected, the first result's title is not relative with Javascript.
How does Mongodb define score for text search?
I don't have an answer for this yet, this is something I want to figure out. In the mean time, once your text search is enable, you can try something as following. You can also find document from MongoDB at Here.
-
search word
db.post.find( { $text: { $search: "javascript" } } )
-
search phrase
db.post.find( { $text: { $search: "javascript angular react" } } )
-
Exclude Documents that contains term, following will search javascript and
exclude angular from the document
db.post.find( { $text: { $search: "javascript -angular" } } )
-
You can search for different language
db.post.find( { $text: { $search: "javascript", $language: "es" } } )
-
You can search Case Sensitive
db.post.find( { $text: { $search: "Javascript", $caseSensitive: true } } )
-
Case Sensitive for Phrase
db.post.find( { $text: { $search: "\"Javascript React\"", $caseSensitive: true } } )
-
Case Sensitive with exclude term, search Javascript but exclude lower-case
javascript
db.post.find( { $text: { $search: "Javascript -javascript", $caseSensitive: true } } )
- Diacritic Sensitive Search for a Term
- Diacritic Sensitivity with Negated Term
-
Return the Text Search Score
db.posts.find( { $text: { $search: "javascript" } }, { score: { $meta: "textScore" }} )
- Sort by Text Search Score
- Return Top 2 Matching Documents
- Text Search with Additional Query and Sort Expressions