7人兄弟の英語と理科の試験結果を処理します。
Filter
$ nl filter.jql
1 [
2 { id: 1, math: 10, science: 100 },
3 { id: 2, math: 90, science: 30 },
4 { id: 3, math: 20, science: 80 },
5 { id: 4, math: 40, science: 60 },
6 { id: 5, math: 80, science: 40 },
7 { id: 6, math: 60, science: 90 },
8 { id: 7, math: 70, science: 10 }
9 ] -> write(hdfs("exam"));
10 read(hdfs("exam")) -> filter $.math > 70 or $.science >= 90;
11 // 以下の 12-13 行目の処理は、10 行目の処理と同等
12 // $exam = read(hdfs("exam"));
13 // $exam -> filter $.math > 50 or $.science >= 90;
14 quit;
$ jaqlshell -cb filter.jql
{
"location": "exam",
"type": "hdfs"
}
[
{
"id": 1,
"math": 10,
"science": 100
},
{
"id": 2,
"math": 90,
"science": 30
},
{
"id": 5,
"math": 80,
"science": 40
},
{
"id": 6,
"math": 60,
"science": 90
}
]filterは、JSONデータを1エントリづつ読み取って、条件に合うエントリをそのまま出力します。$記号は、処理対象のエントリを表します。
また、シンクを持たない処理の行頭に「変数名=」を付けると、処理結果は変数に代入されます。(ただし、実際にMapReduceが走るのは、ファイル出力や標準出力などの「本物のシンク」を持つ処理が評価されたタイミングになります。)
Transform
$ nl transform.jql
1 $exam = read(hdfs("exam"));
2 $exam -> transform { id: $.id, total: $.math + $.science };
3 $exam -> transform { id: $.id, total: $.math + $.science }
4 -> filter $.total > 100;
5 quit;
$ jaqlshell -cb transform.jql
[
{
"id": 1,
"total": 110
},
{
"id": 2,
"total": 120
},
{
"id": 3,
"total": 100
},
{
"id": 4,
"total": 100
},
{
"id": 5,
"total": 120
},
{
"id": 6,
"total": 150
},
{
"id": 7,
"total": 80
}
]
[
{
"id": 1,
"total": 110
},
{
"id": 2,
"total": 120
},
{
"id": 5,
"total": 120
},
{
"id": 6,
"total": 150
}
]transformは、JSONデータを1エントリづつ読み取って、そこから新しく構成した1エントリを出力します。3-4 行目は、transformとfilterを結合した例です。
Sort
$ nl sort.jql
1 $exam = read(hdfs("exam"));
2 $exam -> transform { id: $.id, total: $.math + $.science }
3 -> sort by [ $.total asc, $.id desc ];
4 quit;
$ jaqlshell -cb sort.jql
[
{
"id": 7,
"total": 80
},
{
"id": 4,
"total": 100
},
{
"id": 3,
"total": 100
},
{
"id": 1,
"total": 110
},
{
"id": 5,
"total": 120
},
{
"id": 2,
"total": 120
},
{
"id": 6,
"total": 150
}
]その名の通り、指定の要素でソートします。
Join
$ nl join.jql
1 [
2 { id: 1, name: "Osomatsu" },
3 { id: 2, name: "Karamatsu" },
4 { id: 3, name: "Choromatsu" },
5 { id: 4, name: "Ichimatsu" },
6 { id: 5, name: "Jyushimatsu" },
7 { id: 6, name: "Todomatsu" }
8 ] -> write(hdfs("brothers"));
9 $exam = read(hdfs("exam"));
10 $brothers = read(hdfs("brothers"));
11 join $exam, $brothers where $exam.id == $brothers.id
12 into { name: $brothers.name, math: $exam.math, science: $exam.science };
13 join $exam, $brothers where $exam.id == $brothers.id
14 into { name: $brothers.name, total: $exam.math + $exam.science };
15 quit;
$ jaqlshell -cb join.jql
{
"location": "brothers",
"type": "hdfs"
}
[
{
"name": "Osomatsu",
"math": 10,
"science": 100
},
{
"name": "Karamatsu",
"math": 90,
"science": 30
},
{
"name": "Choromatsu",
"math": 20,
"science": 80
},
{
"name": "Ichimatsu",
"math": 40,
"science": 60
},
{
"name": "Jyushimatsu",
"math": 80,
"science": 40
},
{
"name": "Todomatsu",
"math": 60,
"science": 90
}
]
[
{
"name": "Osomatsu",
"total": 110
},
{
"name": "Karamatsu",
"total": 120
},
{
"name": "Choromatsu",
"total": 100
},
{
"name": "Ichimatsu",
"total": 100
},
{
"name": "Jyushimatsu",
"total": 120
},
{
"name": "Todomatsu",
"total": 150
}
]joinは2つのデータを結合します。SQLのインナージョインに相当します。出力データはinto句で指定しますが、2つ目の例のように計算式を含めることもできます。
Outer Join
$ nl outerjoin.jql
1 [
2 { id: 1, name: "Osomatsu" },
3 { id: 2, name: "Karamatsu" },
4 { id: 3, name: "Choromatsu" },
5 ] -> write(hdfs("partial_brothers"));
6 $exam = read(hdfs("exam"));
7 $brothers = read(hdfs("partial_brothers"));
8 join $exam, $brothers where $exam.id == $brothers.id
9 into { id: $exam.id, name: $brothers.name };
10 join preserve $exam, $brothers where $exam.id == $brothers.id
11 into { id: $exam.id, name: $brothers.name };
12 quit;
$ jaqlshell -cb outerjoin.jql
{
"location": "partial_brothers",
"type": "hdfs"
}
[
{
"id": 1,
"name": "Osomatsu"
},
{
"id": 2,
"name": "Karamatsu"
},
{
"id": 3,
"name": "Choromatsu"
}
]
[
{
"id": 1,
"name": "Osomatsu"
},
{
"id": 2,
"name": "Karamatsu"
},
{
"id": 3,
"name": "Choromatsu"
},
{
"id": 4,
"name": null
},
{
"id": 5,
"name": null
},
{
"id": 6,
"name": null
},
{
"id": 7,
"name": null
}
]先ほどのjoinで、変数にpreserveを指定するとアウタージョインになります。
Group
$ nl groupby.jql
1 [
2 { id: 1, name: "Osomatsu", class: 1 },
3 { id: 2, name: "Karamatsu", class: 1 },
4 { id: 3, name: "Choromatsu", class: 2 },
5 { id: 4, name: "Ichimatsu", class: 2 },
6 { id: 5, name: "Jyushimatsu", class: 2 },
7 { id: 6, name: "Todomatsu", class: 3 }
8 ] -> write(hdfs("brothers_with_class"));
9 $exam = read(hdfs("exam"));
10 $brothers = read(hdfs("brothers_with_class"));
11 join $exam, $brothers where $exam.id == $brothers.id
12 into { class: $brothers.class, math: $exam.math, science: $exam.science }
13 -> group by $class_group = $.class
14 into { class: $class_group, members: count( $[*].id ),
15 avg_math: avg( $[*].math ), avg_science: avg( $[*].science ) };
$ jaqlshell -cb groupby.jql
{
"location": "brothers_with_class",
"type": "hdfs"
}
[
{
"class": 1,
"members": 2,
"avg_math": 50,
"avg_science": 65
},
{
"class": 2,
"members": 3,
"avg_math": 46,
"avg_science": 60
},
{
"class": 3,
"members": 1,
"avg_math": 60,
"avg_science": 90
}
]特定の要素が同じ値のデータをグループ化して、1つのデータを出力します。グループに含まれる全データは、$[*]で参照されるので、これに対して、count/sum/avgなどの集計演算を行います。