MongoDB Aggregation CheatSheet
Sample cheatsheet with common aggregation pipeline operators examples
Data Access
Data can be accessed by single keys, ranges, faceted search, graph traversals, and geospatial queries through to complex aggregations and analytics.
# Aggregation Framework
- Basically we run aggregate() instead of find()
- Source docs are chain transformed into the final docs (eg. filtering, processing via operators)
- Aggregate() Inspects docs and collects them
- Aggregate() HAS an accumulator memory
- Find() DOES NOT HAVE a memory
- Aggregate() is a chain of pipeline operators (each pipeline op, only sees docs passed from the previous pipeline object)## Aggregation (pipeline operations / transformations)
- Better than find(), because it can changed the type of return documents.
- Inspects documents and fields
- Has memory to perform accumulator based operations
Basics
var pipeOp1 = '..', pipeOp2 = '..', pipeOp3 = '..';
db.MyCollection.aggregate(pipeOp1,pipeOp2,pipeOp3);
// PipeOp1 > return docs; > PipeOp2 > return docs; > PipeOp3 > return docs;
Pipeline Operators
$match - FILTER incoming documents to match a criteria $group - GROUPS docs based on a common key (eg. group all docs that have color: red) $unwind - EXPANDS into multiple documents a single document with an array of id elements $project - PLUCK fields from the documents
$limit - limits the number of documents on the output $skip - skips the first amount of documents $sort - outputs documents in order $geoNear - geography operator (which makes use of other pipeline operators)
Examples
-
Group {$group: {_id: 'all', x:{ $sum: 1 }}} // {
KEY_FIELD, VALUE_FIELD} -
Count $pipeop1 = { $group: {_id:null, count: {$sum:1}}}; $pipe1 = { $group: { _id: "all", "totalDocs": {$sum:1} }}; db.banners.aggregate([$pipeop1]); // Alternative: db.impressions.countt()
-
Min Max and Average $min, $max, $average db.impressions.aggregate([{$group: {_id: "all", min: {$min : "$pathMeta.CampID"} } }]);
-
Benchmark Benchmark *first 10 million $pipeLimit = { $limit: 10000000 }; // 10 million db['impressions'].aggregate( [$pipeLimit], { allowDiskUse: true } );
-
Projection $pipeLimit = { $limit: 100000000 }; // 10 million $pipeProject = { $project: { path: 1, _id: 0 } }; $pipeSortCompoundDates = { $sort: { 'date': 1 } }; db['impressions'].aggregate( [$pipeProject], { allowDiskUse: true } );
-
Count all docs Info: 1 million docs = 17s var $countOp = { $group: { "_id": "all", "sum": {"$sum":1} } } db['impressions-150518'].aggregate($countOp);
-
Group by field var $groupOp = { $group: { "_id": "$pathMeta", "sum": {"$sum":1} } } db['impressions-150518'].aggregate($groupOp);
-
Max Operator var $maxOp = { $group: { "_id": "all", "max": {$max: "$sum"} } } db['impressions-150518'].aggregate($maxOp);
-
Takes specific field value and adds the results array for the group, * Distinct members only var $addToSet = { $group: { "_id": "$pathMeta.CampID", "referrer": {"$addToSet": "$referrer"} } } db['impressions-150518'].aggregate([ $addToSet]);
-
Takes specific field value and adds the results array for the group, * repeated members)) var $addToSet = { $group: { "_id": "$pathMeta.CampID", "referrer": {"$push": "$referrer"} } } db['impressions-150518'].aggregate([ $addToSet]);
-
Regex return model.aggregate() .match({ 'path': { $regex: new RegExp(experimentId, 'i') }} )
Performance Recommendations
- if $sort is the FIRST in pipeline it CAN make use of indexing (original docs)
- $group and $project return new documents which CANNOT index to the original collection docs
- if $sort is used after $group, $project (ie, not the first) it CANNOT use indexing, because $group and $project return new documents which don't index to the original collection docs
- Indexes speed dramatically the speed of $sort.
- $sort SHOULD only be used after $match because sort needs to run throughout all docs
- $match filters as the docs as they pass along the pipe
- $limit doesn't allow more than x docs to pass on that pipe
- Projecting in the beginning of the pipeline takes load from the rest of the processing
Ids
{ "$match": { "_id": { "$in": ids } } }
Created on 2/10/2019