$group (aggregation)
Definition
$group
The $group stage combines multiple documents with the same field, fields, or expression into a single document according to a group key. The result is one document per unique group key.
A group key is often a field, or group of fields. The group key can also be the result of an expression. Use the
usage examples._id
field in the$group
pipeline stage to set the group key. See below forIn the
$group
stage output, the_id
field is set to the group key for that document.The output documents can also contain additional fields that are set using accumulator expressions.
Note
$group
does not order its output documents.
Compatibility
You can use $group
for deployments hosted in the following environments:
MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud
- MongoDB Enterprise: The subscription-based, self-managed version of MongoDB
MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB
Syntax
The $group
stage has the following form:
{ $group: { _id: <expression>, // Group key <field1>: { <accumulator1> : <expression1> }, ... } }
Field | Description |
---|---|
| Required. The |
| Optional. Computed using the accumulator operators. |
The _id
and the accumulator operators can accept any valid expression
. For more information on expressions, see Expression Operators.
Considerations
Performance
$group
is a blocking stage, which causes the pipeline to wait for all input data to be retrieved for the blocking stage before processing the data. A blocking stage may reduce performance because it reduces parallel processing for a pipeline with multiple stages. A blocking stage may also use substantial amounts of memory for large data sets.
Accumulator Operator
The <accumulator>
operator must be one of the following accumulator operators:
Changed in version 5.0.
Name | Description |
---|---|
Returns the result of a user-defined accumulator function. | |
Returns an array of unique expression values for each group. Order of the array elements is undefined. Changed in version 5.0: Available in the | |
Returns an average of numerical values. Ignores non-numeric values. Changed in version 5.0: Available in the | |
Returns the bottom element within a group according to the specified sort order. New in version 5.2. Available in the | |
Returns an aggregation of the bottom New in version 5.2. Available in the | |
Returns the number of documents in a group. Distinct from the New in version 5.0: Available in the | |
Returns the result of an expression for the first document in a group. Changed in version 5.0: Available in the | |
Returns an aggregation of the first New in version 5.2: Available in the | |
Returns the result of an expression for the last document in a group. Changed in version 5.0: Available in the | |
Returns an aggregation of the last New in version 5.2: Available in the | |
Returns the highest expression value for each group. Changed in version 5.0: Available in the | |
Returns an aggregation of the New in version 5.2. Available in | |
Returns an approximation of the median, the 50th percentile, as a scalar value. New in version 7.0. This operator is available as an accumulator in these stages: It is also available as an aggregation expression. | |
Returns a document created by combining the input documents for each group. | |
Returns the lowest expression value for each group. Changed in version 5.0: Available in the | |
Returns an aggregation of the New in version 5.2. Available in | |
Returns an array of scalar values that correspond to specified percentile values. New in version 7.0. This operator is available as an accumulator in these stages: It is also available as an aggregation expression. | |
Returns an array of expression values for documents in each group. Changed in version 5.0: Available in the | |
Returns the population standard deviation of the input values. Changed in version 5.0: Available in the | |
Returns the sample standard deviation of the input values. Changed in version 5.0: Available in the | |
Returns a sum of numerical values. Ignores non-numeric values. Changed in version 5.0: Available in the | |
Returns the top element within a group according to the specified sort order. New in version 5.2. Available in the | |
Returns an aggregation of the top New in version 5.2. Available in the |
$group
and Memory Restrictions
If the $group
stage exceeds 100 megabytes of RAM, MongoDB writes data to temporary files. However, if the allowDiskUse option is set to false
, $group
returns an error. For more information, refer to Aggregation Pipeline Limits.
$group
Performance Optimizations
This section describes optimizations to improve the performance of $group
. There are optimizations that you can make manually and optimizations MongoDB makes internally.
Optimization to Return the First or Last Document of Each Group
If a pipeline sorts
and groups
by the same field and the $group
stage only uses the $first
or $last
accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group
stage can use the index to quickly find the first or last document of each group.
Example
If a collection named foo
contains an index { x: 1, y: 1 }
, the following pipeline can use that index to find the first document of each group:
db.foo.aggregate([ { $sort:{ x : 1, y : 1 } }, { $group: { _id: { x : "$x" }, y: { $first : "$y" } } } ])
Slot-Based Query Execution Engine
Starting in version 5.2, MongoDB uses the slot-based execution query engine to execute $group
stages if either:
$group
is the first stage in the pipeline.All preceding stages in the pipeline can also be executed by the slot-based execution engine.
For more information, see $group
Optimization.
Examples
Count the Number of Documents in a Collection
In mongosh
, create a sample collection named sales
with the following documents:
db.sales.insertMany([ { "_id" : 1, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("2"), "date" : ISODate("2014-03-01T08:00:00Z") }, { "_id" : 2, "item" : "jkl", "price" : Decimal128("20"), "quantity" : Int32("1"), "date" : ISODate("2014-03-01T09:00:00Z") }, { "_id" : 3, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32( "10"), "date" : ISODate("2014-03-15T09:00:00Z") }, { "_id" : 4, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") }, { "_id" : 5, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") }, { "_id" : 6, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") }, { "_id" : 7, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("10") , "date" : ISODate("2015-09-10T08:43:00Z") }, { "_id" : 8, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") }, ])
The following aggregation operation uses the $group
stage to count the number of documents in the sales
collection:
db.sales.aggregate( [ { $group: { _id: null, count: { $count: { } } } } ] )
The operation returns the following result:
{ "_id" : null, "count" : 8 }
This aggregation operation is equivalent to the following SQL statement:
SELECT COUNT(*) AS count FROM sales
See also:
Retrieve Distinct Values
The following aggregation operation uses the $group
stage to retrieve the distinct item values from the sales
collection:
db.sales.aggregate( [ { $group : { _id : "$item" } } ] )
The operation returns the following result:
{ "_id" : "abc" } { "_id" : "jkl" } { "_id" : "def" } { "_id" : "xyz" }
Note
When you use $group
to retrieve distinct values in a sharded collection, if the operation results in a DISTINCT_SCAN
, the result might contain orphaned documents.
The only semantically correct pipeline that is impacted is effectively a logical equivalent of a distinct
command, where there is a $group
stage at or near the beginning of the pipeline and the $group
is not preceded by a $sort
stage.
For example, $group
operations of the following form can result in a DISTINCT_SCAN
:
{ $group : { _id : "$<field>" } }
For more information on behavior for retrieving distinct values, see the distinct command behavior.
To see whether your operation results in a DISTINCT_SCAN
, check your operation's explain results.
Group by Item Having
The following aggregation operation groups documents by the item
field, calculating the total sale amount per item and returning only the items with total sale amount greater than or equal to 100:
db.sales.aggregate( [ // First Stage { $group : { _id : "$item", totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } } } }, // Second Stage { $match: { "totalSaleAmount": { $gte: 100 } } } ] )
- First Stage:
- The
$group
stage groups the documents byitem
to retrieve the distinct item values. This stage returns thetotalSaleAmount
for each item. - Second Stage:
- The
$match
stage filters the resulting documents to only return items with atotalSaleAmount
greater than or equal to 100.
The operation returns the following result:
{ "_id" : "abc", "totalSaleAmount" : Decimal128("170") } { "_id" : "xyz", "totalSaleAmount" : Decimal128("150") } { "_id" : "def", "totalSaleAmount" : Decimal128("112.5") }
This aggregation operation is equivalent to the following SQL statement:
SELECT item, Sum(( price * quantity )) AS totalSaleAmount FROM sales GROUP BY item HAVING totalSaleAmount >= 100
See also:
Calculate Count, Sum, and Average
In mongosh
, create a sample collection named sales
with the following documents:
db.sales.insertMany([ { "_id" : 1, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("2"), "date" : ISODate("2014-03-01T08:00:00Z") }, { "_id" : 2, "item" : "jkl", "price" : Decimal128("20"), "quantity" : Int32("1"), "date" : ISODate("2014-03-01T09:00:00Z") }, { "_id" : 3, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32( "10"), "date" : ISODate("2014-03-15T09:00:00Z") }, { "_id" : 4, "item" : "xyz", "price" : Decimal128("5"), "quantity" : Int32("20") , "date" : ISODate("2014-04-04T11:21:39.736Z") }, { "_id" : 5, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("10") , "date" : ISODate("2014-04-04T21:23:13.331Z") }, { "_id" : 6, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("5" ) , "date" : ISODate("2015-06-04T05:08:13Z") }, { "_id" : 7, "item" : "def", "price" : Decimal128("7.5"), "quantity": Int32("10") , "date" : ISODate("2015-09-10T08:43:00Z") }, { "_id" : 8, "item" : "abc", "price" : Decimal128("10"), "quantity" : Int32("5" ) , "date" : ISODate("2016-02-06T20:20:13Z") }, ])
Group by Day of the Year
The following pipeline calculates the total sales amount, average sales quantity, and sale count for each day in the year 2014:
db.sales.aggregate([ // First Stage { $match : { "date": { $gte: new ISODate("2014-01-01"), $lt: new ISODate("2015-01-01") } } }, // Second Stage { $group : { _id : { $dateToString: { format: "%Y-%m-%d", date: "$date" } }, totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } }, // Third Stage { $sort : { totalSaleAmount: -1 } } ])
- First Stage:
- The
$match
stage filters the documents to only pass documents from the year 2014 to the next stage. - Second Stage:
- The
$group
stage groups the documents by date and calculates the total sale amount, average quantity, and total count of the documents in each group. - Third Stage:
- The
$sort
stage sorts the results by the total sale amount for each group in descending order.
The operation returns the following results:
{ "_id" : "2014-04-04", "totalSaleAmount" : Decimal128("200"), "averageQuantity" : 15, "count" : 2 } { "_id" : "2014-03-15", "totalSaleAmount" : Decimal128("50"), "averageQuantity" : 10, "count" : 1 } { "_id" : "2014-03-01", "totalSaleAmount" : Decimal128("40"), "averageQuantity" : 1.5, "count" : 2 }
This aggregation operation is equivalent to the following SQL statement:
SELECT date, Sum(( price * quantity )) AS totalSaleAmount, Avg(quantity) AS averageQuantity, Count(*) AS Count FROM sales WHERE date >= '01/01/2014' AND date < '01/01/2015' GROUP BY date ORDER BY totalSaleAmount DESC
See also:
db.collection.countDocuments()
which wraps the$group
aggregation stage with a$sum
expression.
Group by null
The following aggregation operation specifies a group _id
of null
, calculating the total sale amount, average quantity, and count of all documents in the collection.
db.sales.aggregate([ { $group : { _id : null, totalSaleAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } }, averageQuantity: { $avg: "$quantity" }, count: { $sum: 1 } } } ])
The operation returns the following result:
{ "_id" : null, "totalSaleAmount" : Decimal128("452.5"), "averageQuantity" : 7.875, "count" : 8 }
This aggregation operation is equivalent to the following SQL statement:
SELECT Sum(price * quantity) AS totalSaleAmount, Avg(quantity) AS averageQuantity, Count(*) AS Count FROM sales
See also:
db.collection.countDocuments()
which wraps the$group
aggregation stage with a$sum
expression.
Pivot Data
In mongosh
, create a sample collection named books
with the following documents:
db.books.insertMany([ { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }, { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }, { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 }, { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }, { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 } ])
Group title
by author
The following aggregation operation pivots the data in the books
collection to have titles grouped by authors.
db.books.aggregate([ { $group : { _id : "$author", books: { $push: "$title" } } } ])
The operation returns the following documents:
{ "_id" : "Homer", "books" : [ "The Odyssey", "Iliad" ] } { "_id" : "Dante", "books" : [ "The Banquet", "Divine Comedy", "Eclogues" ] }
Group Documents by author
The following aggregation operation groups documents by author
:
db.books.aggregate([ // First Stage { $group : { _id : "$author", books: { $push: "$$ROOT" } } }, // Second Stage { $addFields: { totalCopies : { $sum: "$books.copies" } } } ])
- First Stage:
$group
uses the$$ROOT
system variable to group the entire documents by authors. This stage passes the following documents to the next stage:{ "_id" : "Homer", "books" : [ { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }, { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 } ] }, { "_id" : "Dante", "books" : [ { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }, { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }, { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 } ] } - Second Stage:
$addFields
adds a field to the output containing the total copies of books for each author.Note
The resulting documents must not exceed the BSON Document Size limit of 16 mebibytes.
The operation returns the following documents:
{ "_id" : "Homer", "books" : [ { "_id" : 7000, "title" : "The Odyssey", "author" : "Homer", "copies" : 10 }, { "_id" : 7020, "title" : "Iliad", "author" : "Homer", "copies" : 10 } ], "totalCopies" : 20 } { "_id" : "Dante", "books" : [ { "_id" : 8751, "title" : "The Banquet", "author" : "Dante", "copies" : 2 }, { "_id" : 8752, "title" : "Divine Comedy", "author" : "Dante", "copies" : 1 }, { "_id" : 8645, "title" : "Eclogues", "author" : "Dante", "copies" : 2 } ], "totalCopies" : 5 }
The C# examples on this page use the sample_mflix
database from the Atlas sample datasets. To learn how to create a free MongoDB Atlas cluster and load the sample datasets, see Get Started in the MongoDB .NET/C# Driver documentation.
The following Movie
class models the documents in the sample_mflix.movies
collection:
public class Movie { public string Id { get; set; } public int Runtime { get; set; } public string Title { get; set; } public string Rated { get; set; } public List<string> Genres { get; set; } public string Plot { get; set; } public ImdbData Imdb { get; set; } public int Year { get; set; } public int Index { get; set; } public string[] Comments { get; set; } [ ] public DateTime LastUpdated { get; set; } }
To use the MongoDB .NET/C# driver to add a $group
stage to an aggregation pipeline, call the Group()method on a PipelineDefinition
object.
The following example creates a pipeline stage that groups documents by the value of their Rated
field. Each group's rating is shown in a field named Rating
in each output document. Each output document also contains a field named TotalRuntime
, whose value is the total runtime of all movies in the group.
var pipeline = new EmptyPipelineDefinition<Movie>() .Group( id: m => m.Rated, group: g => new { Rating = g.Key, TotalRuntime = g.Sum(m => m.Runtime) } );
See also:
Additional Resources
The Aggregation with the Zip Code Data Set tutorial provides an extensive example of the $group
operator in a common use case.