Alex Liang

MongoDB Aggregation介紹

實際使用mongoDB時,遇到無法簡單下query得到結果的情況(例如計算不同科目上課學生的平均分數),我們會使用aggregation。

先來看一段aggregation的例子:

1
2
3
4
5
6
7
8
9
10
db.companies.aggregate( [
{ $match: { "relationships.person": { $ne: null } } },
{ $project: { relationships: 1, _id: 0 } },
{ $unwind: "$relationships" },
{ $group: {
_id: "$relationships.person",
count: { $sum: 1 }
} },
{ $sort: { count: -1 } }
] )

aggregation的語法是以[]封裝所有的流程,裡面每一個{}可視為stage,類似linux command line可以將shell script的結果傳給下一個;以上例來說,先找出companies裡relationships的person不為空值的結果,將其傳給下一階段。

而常見的stage有$match, $project, $unwind和$group

$match能夠找出指定條件的document
$project和query的使用方式一樣,選擇結果想要出現的欄位
$unwind則是將document的陣列元素打散產生數個document,使新的document只出現一個該陣列元素
$group能將collection裡依條件做分群的動作,之後還能對新document做數值運算

再回到上例,$match的結果到$project stage時,選擇只要relationships這個陣列的資料,在$unwind根據relationships產生數個document,每個document就是一筆陣列元素。接下來$group把relationships的person當作新document的_id,以person分群並計算同一個人參與多少公司。最後根據count由大排到小。

一般建議的作法是先使用$match或$unwind限縮document的數量,讓之後的處理能加快。

下一個例子是找出學校科目中學生平均最高的例子,document內容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
"_id" : ObjectId("50b59cd75bed76f46522c392"),
"student_id" : 10,
"class_id" : 5,
"scores" : [
{
"type" : "exam",
"score" : 69.17634380939022
},
{
"type" : "quiz",
"score" : 61.20182926719762
},
{
"type" : "homework",
"score" : 73.3293624199466
},
{
"type" : "homework",
"score" : 15.206314042622903
},
{
"type" : "homework",
"score" : 36.75297723087603
},
{
"type" : "homework",
"score" : 64.42913107330241
}
]
}

我們要找的目標只包含homework和exam,不包括quiz,所以第一階階使用$match和$unwind

1
2
3
4
> db.grades.aggregate( [ 
{ $unwind: "$scores" },
{ $match: { "scores.type": { $ne: "quiz" } } }
])

結果:

1
2
3
4
5
6
{ "_id" : ObjectId("50b59cd75bed76f46522c34e"), "student_id" : 0, "class_id" : 2, "scores" : { "type" : "exam", "score" : 57.92947112575566 } }
{ "_id" : ObjectId("50b59cd75bed76f46522c34e"), "student_id" : 0, "class_id" : 2, "scores" : { "type" : "homework", "score" : 68.1956781058743 } }
{ "_id" : ObjectId("50b59cd75bed76f46522c34e"), "student_id" : 0, "class_id" : 2, "scores" : { "type" : "homework", "score" : 67.95019716560351 } }
{ "_id" : ObjectId("50b59cd75bed76f46522c34e"), "student_id" : 0, "class_id" : 2, "scores" : { "type" : "homework", "score" : 18.81037253352722 } }
{ "_id" : ObjectId("50b59cd75bed76f46522c351"), "student_id" : 0, "class_id" : 16, "scores" : { "type" : "exam", "score" : 59.1805667559299 } }
{ "_id" : ObjectId("50b59cd75bed76f46522c351"), "student_id" : 0, "class_id" : 16, "scores" : { "type" : "homework", "score" : 6.48470951607214 } }

再來做group,以’class_id’為新collection的_id並計算該科的平均分數,最後再做排序:

1
2
3
4
5
6
> db.grades.aggregate( [ 
{ $unwind: "$scores" },
{ $match: { "scores.type": { $ne: "quiz" } } },
{ $group: { _id: { "class": "$class_id" }, avg: { $avg: "$scores.score" } } },
{ $sort: { avg: -1 }}
])
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{ "_id" : { "class" : 1 }, "avg" : 64.89462395242741 }
{ "_id" : { "class" : 9 }, "avg" : 58.32770509408073 }
{ "_id" : { "class" : 5 }, "avg" : 56.80408386154663 }
{ "_id" : { "class" : 26 }, "avg" : 55.73775030281308 }
{ "_id" : { "class" : 4 }, "avg" : 55.51344573971435 }
{ "_id" : { "class" : 20 }, "avg" : 54.06379996612098 }
{ "_id" : { "class" : 24 }, "avg" : 53.72872750965091 }
{ "_id" : { "class" : 16 }, "avg" : 53.36304538085671 }
{ "_id" : { "class" : 13 }, "avg" : 52.10852066426234 }
{ "_id" : { "class" : 14 }, "avg" : 51.843459264371795 }
{ "_id" : { "class" : 23 }, "avg" : 51.55076295987544 }
{ "_id" : { "class" : 11 }, "avg" : 51.52594302618334 }
{ "_id" : { "class" : 3 }, "avg" : 51.415324716021125 }
{ "_id" : { "class" : 29 }, "avg" : 50.190997428907586 }
{ "_id" : { "class" : 0 }, "avg" : 49.79223626365225 }
{ "_id" : { "class" : 17 }, "avg" : 49.63426020975154 }
{ "_id" : { "class" : 6 }, "avg" : 49.51404572271715 }
{ "_id" : { "class" : 27 }, "avg" : 49.195554303080606 }
{ "_id" : { "class" : 19 }, "avg" : 48.92999492446235 }
{ "_id" : { "class" : 7 }, "avg" : 48.78360353370019 }

官方參考文件