Fork me on GitHub

MongoDB Cookbook MongoDB Cookbook

Finding Max And Min Values with Versioned Documents

Credit: Amos King

Problem

You want to list the latest version numbers of a set of documents. Each document contains a field that represents the version of the document and a field representing which document that is a version of:

{
    "document_id" : "mongoDB How-To",
    "author" : "Amos King",
    "content" : "...",
    "version" : 1.0
}

We want to end up with a collection of document_ids and their largest version number:

{"_id" : "mongoDB How To", "value" : 1.1}
{"_id" : "Resume", "value" : 6}
{"_id" : "Schema", "value" : 1}

Solution

Use the mapreduce database command. Emit each document_id and version in the map function, then use the reduce function to find the max version.

1. Map

The map function is very simple. We use our common element between all versions as the key and the version as the value:

map = function () {
    emit(this.document_id, this.version);
}

2. Reduce

The reduce function is also very simple but has a little bit of javascript magic. Math.max normally takes in any number of arguments(ie. Math.max(1,2,3) ), but we need to call it with an array. So we call Math.max with apply so that we can pass in an array of values to max. The apply breaks the array into individual arguments to pass to Math.max. The first argument to apply is the context in which we want to run; Math will do fine here.

reduce = function (key, values) {
    return Math.max.apply(Math, values);
}

Finding the minimum value is as easy as replacing Math.max with Math.min.

3. Call the mapreduce command

Now it's time to get our result set. We'll set the output collection name parameter to 'newest_versions' so that we'll have an appropriately named set to work with:

> result = db.runCommand({
... "mapreduce" : "documents",
... "map" : map,
... "reduce" : reduce,
... "out" : "newest_versions"})

Now, we query the 'newest_versions' collection. Each document is exactly what we're looking for:

> db.newest_versions.find()
{"_id" : "mongoDB How To", "value" : 1.1}
{"_id" : "Resume", "value" : 6}
{"_id" : "Schema", "value" : 1}

Extras

The Map and Reduce Functions can be rewritten slightly to return the Maximum and Minimum versions of each document.

For the purpose of this example, the input collection is as follows: (The _id values have been truncated for brevity.)

> db.documents.find()
{ "_id" : 1, "document_id" : "mongoDB How-To", "author" : "Amos King", "content" : "...", "version" : 1 }
{ "_id" : 2, "document_id" : "mongoDB How-To", "author" : "Amos King", "content" : "...", "version" : 1.1 }
{ "_id" : 3, "document_id" : "Resume", "author" : "Author", "content" : "...", "version" : 6 }
{ "_id" : 4, "document_id" : "Schema", "author" : "Someone Else", "content" : "...", "version" : 0.9 }
{ "_id" : 5, "document_id" : "Schema", "author" : "Someone Else", "content" : "...", "version" : 1 }
> 

Map

The new Map function emits documents containing the document_id, and "value" key containing a list of embedded documents, each containing the keys, "max" and "min". Both keys are initially set to be equal to the "version" key of the current document. Because there is only one document containing the "document_id" : "Resume", this output will not need to be reduced.

map = function () { 
    emit(this.document_id, {max:this.version, min:this.version}); 
}

The Map function will emit something that looks like the following:

"mongoDB How-To", { "max" : 1, "min" : 1 }
"mongoDB How-To", { "max" : 1.1, "min" : 1.1 }
"Resume", { "max" : 6, "min" : 6 }
"Schema", { "max" : 0.9, "min" : 0.9 }
"Schema", { "max" : 1, "min" : 1 }

Reduce

Next the Reduce function will be run to compress the data emit by the Map function. The Reduce function requires an input of an id, and a list of values. It must output an id and a single value, which in this case is a document containing the keys, "max" and "min". The reduce function will interpret the data that has been emitted from the Map function as follows:

"mongoDB How-To", [{ "max" : 1, "min" : 1 }, { "max" : 1.1, "min" : 1.1 }]
"Schema", [{ "max" : 0.9, "min" : 0.9 }, { "max" : 1, "min" : 1 }]

The Reduce function will be run repeatedly, passing its previous output value as the new input, until the output list contains only one value.

Notice that the id "Resume" is not passed to the Reduce function, because it only has one value associated with it. This reduce function will find the maximum "max" value, and the minimum "min" value for each key. It will be run twice; once for the id "Schema", and once for the id "mongoDB How-To".

reduce = function (key, values) {
  max = values[0].max;
    min = values[0].min;
    if (values.length > 1){
        for(i in values){
            if(values[i].max > max){
                max = values[i].max;
            };
            if(values[i].min < min){
                min = values[i].min;
            };
        };
    };
    return {"max":max, "min":min};
    }
}

Running mapreduce will return the following:

> result = db.runCommand({"mapreduce" : "documents","map" : map,"reduce" : reduce,"out" : "newest_versions"})
{
    "result" : "newest_versions",
    "timeMillis" : 2,
    "counts" : {
        "input" : 5,
        "emit" : 5,
        "reduce" : 2,
        "output" : 3
    },
    "ok" : 1
}
> db.newest_versions.find()
{ "_id" : "Resume", "value" : { "max" : 6, "min" : 6 } }
{ "_id" : "Schema", "value" : { "max" : 1, "min" : 0.9 } }
{ "_id" : "mongoDB How-To", "value" : { "max" : 1.1, "min" : 1 } }

See Also

blog comments powered by Disqus