7人兄弟の英語と理科の試験結果を処理します。
Filter
$ nl filter.jql 1 [ 2 { id: 1, math: 10, science: 100 }, 3 { id: 2, math: 90, science: 30 }, 4 { id: 3, math: 20, science: 80 }, 5 { id: 4, math: 40, science: 60 }, 6 { id: 5, math: 80, science: 40 }, 7 { id: 6, math: 60, science: 90 }, 8 { id: 7, math: 70, science: 10 } 9 ] -> write(hdfs("exam")); 10 read(hdfs("exam")) -> filter $.math > 70 or $.science >= 90; 11 // 以下の 12-13 行目の処理は、10 行目の処理と同等 12 // $exam = read(hdfs("exam")); 13 // $exam -> filter $.math > 50 or $.science >= 90; 14 quit; $ jaqlshell -cb filter.jql { "location": "exam", "type": "hdfs" } [ { "id": 1, "math": 10, "science": 100 }, { "id": 2, "math": 90, "science": 30 }, { "id": 5, "math": 80, "science": 40 }, { "id": 6, "math": 60, "science": 90 } ]
filterは、JSONデータを1エントリづつ読み取って、条件に合うエントリをそのまま出力します。$記号は、処理対象のエントリを表します。
また、シンクを持たない処理の行頭に「変数名=」を付けると、処理結果は変数に代入されます。(ただし、実際にMapReduceが走るのは、ファイル出力や標準出力などの「本物のシンク」を持つ処理が評価されたタイミングになります。)
Transform
$ nl transform.jql 1 $exam = read(hdfs("exam")); 2 $exam -> transform { id: $.id, total: $.math + $.science }; 3 $exam -> transform { id: $.id, total: $.math + $.science } 4 -> filter $.total > 100; 5 quit; $ jaqlshell -cb transform.jql [ { "id": 1, "total": 110 }, { "id": 2, "total": 120 }, { "id": 3, "total": 100 }, { "id": 4, "total": 100 }, { "id": 5, "total": 120 }, { "id": 6, "total": 150 }, { "id": 7, "total": 80 } ] [ { "id": 1, "total": 110 }, { "id": 2, "total": 120 }, { "id": 5, "total": 120 }, { "id": 6, "total": 150 } ]
transformは、JSONデータを1エントリづつ読み取って、そこから新しく構成した1エントリを出力します。3-4 行目は、transformとfilterを結合した例です。
Sort
$ nl sort.jql 1 $exam = read(hdfs("exam")); 2 $exam -> transform { id: $.id, total: $.math + $.science } 3 -> sort by [ $.total asc, $.id desc ]; 4 quit; $ jaqlshell -cb sort.jql [ { "id": 7, "total": 80 }, { "id": 4, "total": 100 }, { "id": 3, "total": 100 }, { "id": 1, "total": 110 }, { "id": 5, "total": 120 }, { "id": 2, "total": 120 }, { "id": 6, "total": 150 } ]
その名の通り、指定の要素でソートします。
Join
$ nl join.jql 1 [ 2 { id: 1, name: "Osomatsu" }, 3 { id: 2, name: "Karamatsu" }, 4 { id: 3, name: "Choromatsu" }, 5 { id: 4, name: "Ichimatsu" }, 6 { id: 5, name: "Jyushimatsu" }, 7 { id: 6, name: "Todomatsu" } 8 ] -> write(hdfs("brothers")); 9 $exam = read(hdfs("exam")); 10 $brothers = read(hdfs("brothers")); 11 join $exam, $brothers where $exam.id == $brothers.id 12 into { name: $brothers.name, math: $exam.math, science: $exam.science }; 13 join $exam, $brothers where $exam.id == $brothers.id 14 into { name: $brothers.name, total: $exam.math + $exam.science }; 15 quit; $ jaqlshell -cb join.jql { "location": "brothers", "type": "hdfs" } [ { "name": "Osomatsu", "math": 10, "science": 100 }, { "name": "Karamatsu", "math": 90, "science": 30 }, { "name": "Choromatsu", "math": 20, "science": 80 }, { "name": "Ichimatsu", "math": 40, "science": 60 }, { "name": "Jyushimatsu", "math": 80, "science": 40 }, { "name": "Todomatsu", "math": 60, "science": 90 } ] [ { "name": "Osomatsu", "total": 110 }, { "name": "Karamatsu", "total": 120 }, { "name": "Choromatsu", "total": 100 }, { "name": "Ichimatsu", "total": 100 }, { "name": "Jyushimatsu", "total": 120 }, { "name": "Todomatsu", "total": 150 } ]
joinは2つのデータを結合します。SQLのインナージョインに相当します。出力データはinto句で指定しますが、2つ目の例のように計算式を含めることもできます。
Outer Join
$ nl outerjoin.jql 1 [ 2 { id: 1, name: "Osomatsu" }, 3 { id: 2, name: "Karamatsu" }, 4 { id: 3, name: "Choromatsu" }, 5 ] -> write(hdfs("partial_brothers")); 6 $exam = read(hdfs("exam")); 7 $brothers = read(hdfs("partial_brothers")); 8 join $exam, $brothers where $exam.id == $brothers.id 9 into { id: $exam.id, name: $brothers.name }; 10 join preserve $exam, $brothers where $exam.id == $brothers.id 11 into { id: $exam.id, name: $brothers.name }; 12 quit; $ jaqlshell -cb outerjoin.jql { "location": "partial_brothers", "type": "hdfs" } [ { "id": 1, "name": "Osomatsu" }, { "id": 2, "name": "Karamatsu" }, { "id": 3, "name": "Choromatsu" } ] [ { "id": 1, "name": "Osomatsu" }, { "id": 2, "name": "Karamatsu" }, { "id": 3, "name": "Choromatsu" }, { "id": 4, "name": null }, { "id": 5, "name": null }, { "id": 6, "name": null }, { "id": 7, "name": null } ]
先ほどのjoinで、変数にpreserveを指定するとアウタージョインになります。
Group
$ nl groupby.jql 1 [ 2 { id: 1, name: "Osomatsu", class: 1 }, 3 { id: 2, name: "Karamatsu", class: 1 }, 4 { id: 3, name: "Choromatsu", class: 2 }, 5 { id: 4, name: "Ichimatsu", class: 2 }, 6 { id: 5, name: "Jyushimatsu", class: 2 }, 7 { id: 6, name: "Todomatsu", class: 3 } 8 ] -> write(hdfs("brothers_with_class")); 9 $exam = read(hdfs("exam")); 10 $brothers = read(hdfs("brothers_with_class")); 11 join $exam, $brothers where $exam.id == $brothers.id 12 into { class: $brothers.class, math: $exam.math, science: $exam.science } 13 -> group by $class_group = $.class 14 into { class: $class_group, members: count( $[*].id ), 15 avg_math: avg( $[*].math ), avg_science: avg( $[*].science ) }; $ jaqlshell -cb groupby.jql { "location": "brothers_with_class", "type": "hdfs" } [ { "class": 1, "members": 2, "avg_math": 50, "avg_science": 65 }, { "class": 2, "members": 3, "avg_math": 46, "avg_science": 60 }, { "class": 3, "members": 1, "avg_math": 60, "avg_science": 90 } ]
特定の要素が同じ値のデータをグループ化して、1つのデータを出力します。グループに含まれる全データは、$[*]で参照されるので、これに対して、count/sum/avgなどの集計演算を行います。