Fork me on GitHub

MongoDB Cookbook MongoDB Cookbook

Counting Tags

Credit: Kristina Chodorow

Problem

You want to create a tag cloud or see what the most popular tags are in a given collection, say, "posts". Each document in the collection has an array of tags, such as:

{
    "title" : "A blog post",
    "author" : "Kristina",
    "content" : "...",
    "tags" : ["MongoDB", "Map/Reduce", "Recipe"]
}

We want to end up with a "tags" collection that has documents that look like this:

{"_id" : "MongoDB", "value" : 4}
{"_id" : "Map/Reduce", "value" : 2}
{"_id" : "Recipe", "value" : 7}
{"_id" : "Group", "value" : 1}

Solution

Use the mapreduce database command. Emit each tag in the map function, then count them in the reduce function.

1. Map

The map function first checks if there is a tags field, as running a for-loop on undef would cause an error. Once that has been established, we go through each element, emiting the tag name and a count of 1:

map = function() {
    if (!this.tags) {
        return;
    }

    for (index in this.tags) {
        emit(this.tags[index], 1);
    }
}

2. Reduce

For the reduce function, we initialize a counter to 0 and then add each element of the current array to it. Then we return the final count.

reduce = function(previous, current) {
    var count = 0;

    for (index in current) {
        count += current[index];
    }

    return count;
}

3. Call the mapreduce command

We want to put the results in the "tags" collection, so we'll specify that with the out parameter:

> result = db.runCommand({
... "mapreduce" : "posts",
... "map" : map,
... "reduce" : reduce,
... "out" : "tags"})

Now, if we query the tags collection, we find:

> db.tags.find()
{"_id" : "MongoDB", "value" : 4}
{"_id" : "Map/Reduce", "value" : 2}
{"_id" : "Recipe", "value" : 7}
{"_id" : "Group", "value" : 1}

See Also

blog comments powered by Disqus