めもめも

このブログに記載の内容は個人の見解であり、必ずしも所属組織の立場、戦略、意見を代表するものではありません。

Implementing a counter service with Bigtable

Cloud Datastore is the best friend for App Engine developers, yeay!

However, it may be awkward to implement an incremental counter (like AUTO_INCREMENT in MySQL) over Datastore. The good alternative is to use Bigtable to implement a counter service.

Here is a prototype of such a counter service. getCount() returns an auto increment integer number starting from 1 with strong consistency. In this sample, five of go-routines keep getting counts until they receive 100 counts per routine which results in 500 counts in total.

Before running this on Cloud Shell, you need to create a Bigtable cluster and install golang client SDK with the following command.

$ go get -u all

main.go

package main
import (
        "flag"
        "encoding/binary"
        "log"
        "sync"
        "cloud.google.com/go/bigtable"
        "golang.org/x/net/context"
)
// User-provided constants.
const (
        tableName        = "CounterTable"
        columnFamilyName = "count"
        columnName       = ""
        rowName          = "mycounter"
)

// sliceContains reports whether the provided string is present in the given slice of strings.
func sliceContains(list []string, target string) bool {
        for _, s := range list {
                if s == target {
                        return true
                }
        }
        return false
}

func main() {
        project := flag.String("project", "", "The Google Cloud Platform project ID. Required.")
        instance := flag.String("instance", "", "The Google Cloud Bigtable instance ID. Required.")
        flag.Parse()

        for _, f := range []string{"project", "instance"} {
                if flag.Lookup(f).Value.String() == "" {
                        log.Fatalf("The %s flag is required.", f)
                }
        }

        ctx := context.Background()
        adminClient, err := bigtable.NewAdminClient(ctx, *project, *instance)
        if err != nil {
                log.Fatalf("Could not create admin client: %v", err)
        }

        tables, err := adminClient.Tables(ctx)
        if err != nil {
                log.Fatalf("Could not fetch table list: %v", err)
        }

        if !sliceContains(tables, tableName) {
                log.Printf("Creating table %s", tableName)
                if err := adminClient.CreateTable(ctx, tableName); err != nil {
                        log.Fatalf("Could not create table %s: %v", tableName, err)
                }
        }

        tblInfo, err := adminClient.TableInfo(ctx, tableName)
        if err != nil {
                log.Fatalf("Could not read info for table %s: %v", tableName, err)
        }

        if !sliceContains(tblInfo.Families, columnFamilyName) {
                if err := adminClient.CreateColumnFamily(ctx, tableName, columnFamilyName); err != nil {
                        log.Fatalf("Could not create column family %s: %v", columnFamilyName, err)
                }
        }

        client, err := bigtable.NewClient(ctx, *project, *instance)
        if err != nil {
                log.Fatalf("Could not create data operations client: %v", err)
        }

        tbl := client.Open(tableName)
        mut := bigtable.NewMutation()
        mut.Set(columnFamilyName, columnName, bigtable.Now(),[]byte{0, 0, 0, 0, 0, 0, 0, 0}) 

        if err := tbl.Apply(ctx, rowName, mut); err != nil {
                log.Fatalf("Could not create row: %v", err)
        }

        // counter service function
        getCount := func() uint64 {
                rmw := bigtable.NewReadModifyWrite()
                rmw.Increment(columnFamilyName, columnName, 1)
                row, err := tbl.ApplyReadModifyWrite(ctx, rowName, rmw)
                if err != nil {
                         log.Fatalf("Could not update row: %s %v", row, err)
                }
                data := binary.BigEndian.Uint64(row[columnFamilyName][0].Value)
                return data
        }

        var wg sync.WaitGroup
        for i := 0; i < 5; i++ {
                wg.Add(1)
                go func(id int) {
                        log.Printf("Start client:%d", id)
                        for j := 1; j < 101; j++ {
                                data := getCount()
                                log.Printf("\tclient:%d\t%s = %d\n", id, rowName, data)
                        }
                        log.Printf("Finished client:%d", id)
                        wg.Done()
                }(i)
        }
        wg.Wait()
        log.Printf("Finished.")
}
$ go run main.go -project bt-counter -instance mycounter
2016/11/11 10:55:54 Start client:4
2016/11/11 10:55:54 Start client:0
2016/11/11 10:55:54 Start client:1
2016/11/11 10:55:54 Start client:2
2016/11/11 10:55:54 Start client:3
2016/11/11 10:55:54     client:2        mycounter = 1
2016/11/11 10:55:54     client:1        mycounter = 2
2016/11/11 10:55:54     client:2        mycounter = 3
2016/11/11 10:55:54     client:0        mycounter = 4
2016/11/11 10:55:54     client:1        mycounter = 5
2016/11/11 10:55:54     client:2        mycounter = 6
2016/11/11 10:55:54     client:3        mycounter = 7
2016/11/11 10:55:54     client:0        mycounter = 8
2016/11/11 10:55:54     client:4        mycounter = 9
2016/11/11 10:55:54     client:2        mycounter = 10
2016/11/11 10:55:54     client:1        mycounter = 11
2016/11/11 10:55:54     client:3        mycounter = 12
2016/11/11 10:55:54     client:4        mycounter = 14
2016/11/11 10:55:54     client:2        mycounter = 15
2016/11/11 10:55:54     client:1        mycounter = 16
2016/11/11 10:55:54     client:4        mycounter = 17
2016/11/11 10:55:54     client:2        mycounter = 18
2016/11/11 10:55:54     client:1        mycounter = 19
2016/11/11 10:55:54     client:2        mycounter = 20
2016/11/11 10:55:54     client:1        mycounter = 21
2016/11/11 10:55:54     client:2        mycounter = 22
2016/11/11 10:55:54     client:0        mycounter = 13
2016/11/11 10:55:54     client:2        mycounter = 23
2016/11/11 10:55:54     client:1        mycounter = 24
2016/11/11 10:55:54     client:4        mycounter = 25
2016/11/11 10:55:54     client:3        mycounter = 26
2016/11/11 10:55:54     client:0        mycounter = 27
2016/11/11 10:55:54     client:2        mycounter = 28
2016/11/11 10:55:54     client:1        mycounter = 29
2016/11/11 10:55:54     client:4        mycounter = 30
2016/11/11 10:55:54     client:3        mycounter = 31
2016/11/11 10:55:54     client:0        mycounter = 32
2016/11/11 10:55:54     client:2        mycounter = 33
2016/11/11 10:55:54     client:1        mycounter = 34
2016/11/11 10:55:54     client:4        mycounter = 35
2016/11/11 10:55:54     client:3        mycounter = 36
2016/11/11 10:55:54     client:2        mycounter = 38
2016/11/11 10:55:54     client:0        mycounter = 37
2016/11/11 10:55:54     client:1        mycounter = 39
2016/11/11 10:55:54     client:4        mycounter = 40
2016/11/11 10:55:54     client:3        mycounter = 41
2016/11/11 10:55:54     client:2        mycounter = 42
2016/11/11 10:55:54     client:0        mycounter = 43
2016/11/11 10:55:54     client:1        mycounter = 44
2016/11/11 10:55:54     client:4        mycounter = 45
2016/11/11 10:55:54     client:3        mycounter = 46
2016/11/11 10:55:54     client:2        mycounter = 47
2016/11/11 10:55:54     client:1        mycounter = 49
2016/11/11 10:55:54     client:4        mycounter = 50
2016/11/11 10:55:54     client:0        mycounter = 48
2016/11/11 10:55:54     client:2        mycounter = 52
2016/11/11 10:55:54     client:1        mycounter = 53
2016/11/11 10:55:54     client:4        mycounter = 54
2016/11/11 10:55:54     client:0        mycounter = 55
2016/11/11 10:55:54     client:3        mycounter = 51
2016/11/11 10:55:54     client:2        mycounter = 56
2016/11/11 10:55:54     client:1        mycounter = 57
2016/11/11 10:55:54     client:4        mycounter = 58
2016/11/11 10:55:54     client:0        mycounter = 59
2016/11/11 10:55:54     client:3        mycounter = 60
2016/11/11 10:55:54     client:2        mycounter = 61
2016/11/11 10:55:54     client:1        mycounter = 62
2016/11/11 10:55:54     client:4        mycounter = 63
2016/11/11 10:55:54     client:0        mycounter = 64
2016/11/11 10:55:54     client:3        mycounter = 65
2016/11/11 10:55:54     client:2        mycounter = 66
2016/11/11 10:55:54     client:1        mycounter = 67
2016/11/11 10:55:54     client:4        mycounter = 68
2016/11/11 10:55:54     client:3        mycounter = 70
2016/11/11 10:55:55     client:4        mycounter = 73
2016/11/11 10:55:55     client:1        mycounter = 72
2016/11/11 10:55:55     client:2        mycounter = 71
2016/11/11 10:55:55     client:0        mycounter = 69
2016/11/11 10:55:55     client:3        mycounter = 74
2016/11/11 10:55:55     client:4        mycounter = 75
2016/11/11 10:55:55     client:1        mycounter = 76
2016/11/11 10:55:55     client:2        mycounter = 77
2016/11/11 10:55:55     client:0        mycounter = 78
2016/11/11 10:55:55     client:3        mycounter = 79
2016/11/11 10:55:55     client:4        mycounter = 80
2016/11/11 10:55:55     client:2        mycounter = 82
2016/11/11 10:55:55     client:1        mycounter = 81
2016/11/11 10:55:55     client:0        mycounter = 83
2016/11/11 10:55:55     client:3        mycounter = 84
2016/11/11 10:55:55     client:4        mycounter = 85
2016/11/11 10:55:55     client:2        mycounter = 86
2016/11/11 10:55:55     client:1        mycounter = 87
2016/11/11 10:55:55     client:0        mycounter = 88
2016/11/11 10:55:55     client:3        mycounter = 89
2016/11/11 10:55:55     client:4        mycounter = 90
2016/11/11 10:55:55     client:2        mycounter = 91
2016/11/11 10:55:55     client:1        mycounter = 92
2016/11/11 10:55:55     client:4        mycounter = 94
2016/11/11 10:55:55     client:0        mycounter = 93
2016/11/11 10:55:55     client:3        mycounter = 95
2016/11/11 10:55:55     client:2        mycounter = 96
2016/11/11 10:55:55     client:1        mycounter = 97
2016/11/11 10:55:55     client:0        mycounter = 98
2016/11/11 10:55:55     client:4        mycounter = 99
2016/11/11 10:55:55     client:3        mycounter = 100
2016/11/11 10:55:55     client:2        mycounter = 101
2016/11/11 10:55:55     client:1        mycounter = 102
2016/11/11 10:55:55     client:0        mycounter = 103
2016/11/11 10:55:55     client:4        mycounter = 104
2016/11/11 10:55:55     client:3        mycounter = 105
2016/11/11 10:55:55     client:2        mycounter = 106
2016/11/11 10:55:55     client:1        mycounter = 107
2016/11/11 10:55:55     client:0        mycounter = 108
2016/11/11 10:55:55     client:4        mycounter = 109
2016/11/11 10:55:55     client:3        mycounter = 110
2016/11/11 10:55:55     client:2        mycounter = 111
2016/11/11 10:55:55     client:1        mycounter = 112
2016/11/11 10:55:55     client:0        mycounter = 113
2016/11/11 10:55:55     client:4        mycounter = 114
2016/11/11 10:55:55     client:3        mycounter = 115
2016/11/11 10:55:55     client:2        mycounter = 116
2016/11/11 10:55:55     client:0        mycounter = 118
2016/11/11 10:55:55     client:4        mycounter = 119
2016/11/11 10:55:55     client:3        mycounter = 120
2016/11/11 10:55:55     client:2        mycounter = 121
2016/11/11 10:55:55     client:1        mycounter = 117
2016/11/11 10:55:55     client:0        mycounter = 122
2016/11/11 10:55:55     client:4        mycounter = 123
2016/11/11 10:55:55     client:3        mycounter = 124
2016/11/11 10:55:55     client:2        mycounter = 125
2016/11/11 10:55:55     client:1        mycounter = 126
2016/11/11 10:55:55     client:0        mycounter = 127
2016/11/11 10:55:55     client:4        mycounter = 128
2016/11/11 10:55:55     client:3        mycounter = 129
2016/11/11 10:55:55     client:2        mycounter = 130
2016/11/11 10:55:55     client:1        mycounter = 131
2016/11/11 10:55:55     client:0        mycounter = 132
2016/11/11 10:55:55     client:4        mycounter = 133
2016/11/11 10:55:55     client:3        mycounter = 134
2016/11/11 10:55:55     client:2        mycounter = 135
2016/11/11 10:55:55     client:1        mycounter = 136
2016/11/11 10:55:55     client:0        mycounter = 137
2016/11/11 10:55:55     client:4        mycounter = 138
2016/11/11 10:55:55     client:3        mycounter = 139
2016/11/11 10:55:55     client:1        mycounter = 140
2016/11/11 10:55:55     client:2        mycounter = 141
2016/11/11 10:55:55     client:0        mycounter = 142
2016/11/11 10:55:55     client:4        mycounter = 143
2016/11/11 10:55:55     client:3        mycounter = 144
2016/11/11 10:55:55     client:1        mycounter = 145
2016/11/11 10:55:55     client:2        mycounter = 146
2016/11/11 10:55:55     client:0        mycounter = 147
2016/11/11 10:55:55     client:4        mycounter = 148
2016/11/11 10:55:55     client:3        mycounter = 149
2016/11/11 10:55:55     client:1        mycounter = 150
2016/11/11 10:55:55     client:2        mycounter = 151
2016/11/11 10:55:55     client:4        mycounter = 152
2016/11/11 10:55:55     client:0        mycounter = 153
2016/11/11 10:55:55     client:3        mycounter = 154
2016/11/11 10:55:55     client:1        mycounter = 155
2016/11/11 10:55:55     client:2        mycounter = 156
2016/11/11 10:55:55     client:4        mycounter = 157
2016/11/11 10:55:55     client:0        mycounter = 158
2016/11/11 10:55:55     client:3        mycounter = 159
2016/11/11 10:55:55     client:1        mycounter = 160
2016/11/11 10:55:55     client:2        mycounter = 161
2016/11/11 10:55:55     client:4        mycounter = 162
2016/11/11 10:55:55     client:0        mycounter = 163
2016/11/11 10:55:55     client:3        mycounter = 164
2016/11/11 10:55:55     client:1        mycounter = 165
2016/11/11 10:55:55     client:2        mycounter = 166
2016/11/11 10:55:55     client:4        mycounter = 167
2016/11/11 10:55:55     client:0        mycounter = 168
2016/11/11 10:55:55     client:3        mycounter = 169
2016/11/11 10:55:55     client:2        mycounter = 171
2016/11/11 10:55:55     client:4        mycounter = 172
2016/11/11 10:55:55     client:0        mycounter = 173
2016/11/11 10:55:55     client:1        mycounter = 170
2016/11/11 10:55:55     client:3        mycounter = 174
2016/11/11 10:55:55     client:2        mycounter = 175
2016/11/11 10:55:55     client:4        mycounter = 176
2016/11/11 10:55:55     client:0        mycounter = 177
2016/11/11 10:55:55     client:1        mycounter = 178
2016/11/11 10:55:55     client:3        mycounter = 179
2016/11/11 10:55:55     client:2        mycounter = 180
2016/11/11 10:55:55     client:4        mycounter = 181
2016/11/11 10:55:55     client:0        mycounter = 182
2016/11/11 10:55:55     client:1        mycounter = 183
2016/11/11 10:55:55     client:3        mycounter = 184
2016/11/11 10:55:55     client:2        mycounter = 185
2016/11/11 10:55:55     client:4        mycounter = 186
2016/11/11 10:55:55     client:0        mycounter = 187
2016/11/11 10:55:55     client:1        mycounter = 188
2016/11/11 10:55:55     client:3        mycounter = 189
2016/11/11 10:55:55     client:2        mycounter = 190
2016/11/11 10:55:55     client:4        mycounter = 191
2016/11/11 10:55:55     client:0        mycounter = 192
2016/11/11 10:55:55     client:1        mycounter = 193
2016/11/11 10:55:55     client:3        mycounter = 194
2016/11/11 10:55:55     client:2        mycounter = 195
2016/11/11 10:55:55     client:4        mycounter = 196
2016/11/11 10:55:55     client:0        mycounter = 197
2016/11/11 10:55:55     client:1        mycounter = 198
2016/11/11 10:55:55     client:3        mycounter = 199
2016/11/11 10:55:55     client:2        mycounter = 200
2016/11/11 10:55:55     client:4        mycounter = 201
2016/11/11 10:55:55     client:0        mycounter = 202
2016/11/11 10:55:55     client:1        mycounter = 203
2016/11/11 10:55:55     client:3        mycounter = 204
2016/11/11 10:55:55     client:2        mycounter = 205
2016/11/11 10:55:55     client:4        mycounter = 206
2016/11/11 10:55:55     client:0        mycounter = 207
2016/11/11 10:55:55     client:1        mycounter = 208
2016/11/11 10:55:55     client:3        mycounter = 209
2016/11/11 10:55:55     client:2        mycounter = 210
2016/11/11 10:55:55     client:4        mycounter = 211
2016/11/11 10:55:55     client:0        mycounter = 212
2016/11/11 10:55:55     client:1        mycounter = 213
2016/11/11 10:55:55     client:3        mycounter = 214
2016/11/11 10:55:55     client:2        mycounter = 215
2016/11/11 10:55:55     client:4        mycounter = 216
2016/11/11 10:55:55     client:0        mycounter = 217
2016/11/11 10:55:55     client:1        mycounter = 218
2016/11/11 10:55:55     client:3        mycounter = 219
2016/11/11 10:55:55     client:2        mycounter = 220
2016/11/11 10:55:55     client:4        mycounter = 221
2016/11/11 10:55:55     client:0        mycounter = 222
2016/11/11 10:55:55     client:1        mycounter = 223
2016/11/11 10:55:55     client:3        mycounter = 224
2016/11/11 10:55:55     client:2        mycounter = 225
2016/11/11 10:55:55     client:0        mycounter = 227
2016/11/11 10:55:55     client:1        mycounter = 228
2016/11/11 10:55:55     client:4        mycounter = 226
2016/11/11 10:55:55     client:2        mycounter = 230
2016/11/11 10:55:55     client:3        mycounter = 229
2016/11/11 10:55:55     client:0        mycounter = 231
2016/11/11 10:55:55     client:1        mycounter = 232
2016/11/11 10:55:55     client:4        mycounter = 233
2016/11/11 10:55:55     client:2        mycounter = 234
2016/11/11 10:55:55     client:3        mycounter = 235
2016/11/11 10:55:55     client:0        mycounter = 236
2016/11/11 10:55:55     client:1        mycounter = 237
2016/11/11 10:55:55     client:4        mycounter = 238
2016/11/11 10:55:55     client:2        mycounter = 239
2016/11/11 10:55:55     client:3        mycounter = 240
2016/11/11 10:55:55     client:0        mycounter = 241
2016/11/11 10:55:55     client:1        mycounter = 242
2016/11/11 10:55:55     client:4        mycounter = 243
2016/11/11 10:55:55     client:2        mycounter = 244
2016/11/11 10:55:55     client:3        mycounter = 245
2016/11/11 10:55:55     client:0        mycounter = 246
2016/11/11 10:55:55     client:1        mycounter = 247
2016/11/11 10:55:55     client:4        mycounter = 248
2016/11/11 10:55:55     client:3        mycounter = 250
2016/11/11 10:55:55     client:2        mycounter = 249
2016/11/11 10:55:55     client:0        mycounter = 251
2016/11/11 10:55:55     client:1        mycounter = 252
2016/11/11 10:55:55     client:4        mycounter = 253
2016/11/11 10:55:55     client:3        mycounter = 254
2016/11/11 10:55:55     client:2        mycounter = 255
2016/11/11 10:55:55     client:0        mycounter = 256
2016/11/11 10:55:55     client:1        mycounter = 257
2016/11/11 10:55:55     client:4        mycounter = 258
2016/11/11 10:55:55     client:3        mycounter = 259
2016/11/11 10:55:55     client:2        mycounter = 260
2016/11/11 10:55:55     client:0        mycounter = 261
2016/11/11 10:55:55     client:1        mycounter = 262
2016/11/11 10:55:55     client:4        mycounter = 263
2016/11/11 10:55:55     client:3        mycounter = 264
2016/11/11 10:55:55     client:2        mycounter = 265
2016/11/11 10:55:55     client:0        mycounter = 266
2016/11/11 10:55:55     client:1        mycounter = 267
2016/11/11 10:55:55     client:4        mycounter = 268
2016/11/11 10:55:55     client:3        mycounter = 269
2016/11/11 10:55:55     client:2        mycounter = 270
2016/11/11 10:55:55     client:0        mycounter = 271
2016/11/11 10:55:55     client:1        mycounter = 272
2016/11/11 10:55:55     client:4        mycounter = 273
2016/11/11 10:55:55     client:3        mycounter = 274
2016/11/11 10:55:55     client:2        mycounter = 275
2016/11/11 10:55:55     client:0        mycounter = 276
2016/11/11 10:55:55     client:1        mycounter = 277
2016/11/11 10:55:55     client:4        mycounter = 278
2016/11/11 10:55:55     client:3        mycounter = 279
2016/11/11 10:55:55     client:2        mycounter = 280
2016/11/11 10:55:55     client:0        mycounter = 281
2016/11/11 10:55:55     client:1        mycounter = 282
2016/11/11 10:55:55     client:4        mycounter = 283
2016/11/11 10:55:55     client:3        mycounter = 284
2016/11/11 10:55:55     client:2        mycounter = 285
2016/11/11 10:55:55     client:0        mycounter = 286
2016/11/11 10:55:55     client:1        mycounter = 287
2016/11/11 10:55:55     client:4        mycounter = 288
2016/11/11 10:55:55     client:0        mycounter = 290
2016/11/11 10:55:55     client:3        mycounter = 289
2016/11/11 10:55:55     client:2        mycounter = 291
2016/11/11 10:55:55     client:1        mycounter = 292
2016/11/11 10:55:55     client:4        mycounter = 293
2016/11/11 10:55:55     client:0        mycounter = 294
2016/11/11 10:55:55     client:3        mycounter = 295
2016/11/11 10:55:55     client:2        mycounter = 296
2016/11/11 10:55:55     client:1        mycounter = 297
2016/11/11 10:55:55     client:4        mycounter = 298
2016/11/11 10:55:55     client:0        mycounter = 299
2016/11/11 10:55:55     client:3        mycounter = 300
2016/11/11 10:55:55     client:2        mycounter = 301
2016/11/11 10:55:55     client:1        mycounter = 302
2016/11/11 10:55:55     client:4        mycounter = 303
2016/11/11 10:55:55     client:0        mycounter = 304
2016/11/11 10:55:55     client:3        mycounter = 305
2016/11/11 10:55:55     client:2        mycounter = 306
2016/11/11 10:55:55     client:1        mycounter = 307
2016/11/11 10:55:55     client:4        mycounter = 308
2016/11/11 10:55:55     client:0        mycounter = 309
2016/11/11 10:55:55     client:3        mycounter = 310
2016/11/11 10:55:55     client:2        mycounter = 311
2016/11/11 10:55:55     client:1        mycounter = 312
2016/11/11 10:55:55     client:4        mycounter = 313
2016/11/11 10:55:55     client:3        mycounter = 315
2016/11/11 10:55:55     client:0        mycounter = 314
2016/11/11 10:55:55     client:2        mycounter = 316
2016/11/11 10:55:55     client:1        mycounter = 317
2016/11/11 10:55:55     client:3        mycounter = 318
2016/11/11 10:55:55     client:4        mycounter = 319
2016/11/11 10:55:55     client:0        mycounter = 320
2016/11/11 10:55:55     client:1        mycounter = 322
2016/11/11 10:55:55     client:2        mycounter = 321
2016/11/11 10:55:55     client:3        mycounter = 323
2016/11/11 10:55:55     client:4        mycounter = 324
2016/11/11 10:55:55     client:0        mycounter = 325
2016/11/11 10:55:55     client:1        mycounter = 326
2016/11/11 10:55:55     client:2        mycounter = 327
2016/11/11 10:55:55     client:3        mycounter = 328
2016/11/11 10:55:55     client:1        mycounter = 330
2016/11/11 10:55:55     client:4        mycounter = 329
2016/11/11 10:55:55     client:0        mycounter = 331
2016/11/11 10:55:55     client:2        mycounter = 332
2016/11/11 10:55:55     client:3        mycounter = 333
2016/11/11 10:55:55     client:1        mycounter = 334
2016/11/11 10:55:55     client:4        mycounter = 335
2016/11/11 10:55:55     client:0        mycounter = 336
2016/11/11 10:55:55     client:2        mycounter = 337
2016/11/11 10:55:55     client:3        mycounter = 338
2016/11/11 10:55:55     client:1        mycounter = 339
2016/11/11 10:55:55     client:4        mycounter = 340
2016/11/11 10:55:55     client:0        mycounter = 341
2016/11/11 10:55:55     client:2        mycounter = 342
2016/11/11 10:55:55     client:3        mycounter = 343
2016/11/11 10:55:55     client:4        mycounter = 345
2016/11/11 10:55:55     client:0        mycounter = 346
2016/11/11 10:55:55     client:2        mycounter = 347
2016/11/11 10:55:55     client:1        mycounter = 344
2016/11/11 10:55:55     client:3        mycounter = 348
2016/11/11 10:55:55     client:4        mycounter = 349
2016/11/11 10:55:55     client:0        mycounter = 350
2016/11/11 10:55:55     client:2        mycounter = 351
2016/11/11 10:55:55     client:1        mycounter = 352
2016/11/11 10:55:55     client:4        mycounter = 354
2016/11/11 10:55:55     client:3        mycounter = 353
2016/11/11 10:55:55     client:0        mycounter = 355
2016/11/11 10:55:55     client:2        mycounter = 356
2016/11/11 10:55:55     client:1        mycounter = 357
2016/11/11 10:55:55     client:4        mycounter = 358
2016/11/11 10:55:55     client:3        mycounter = 359
2016/11/11 10:55:55     client:0        mycounter = 360
2016/11/11 10:55:55     client:2        mycounter = 361
2016/11/11 10:55:55     client:1        mycounter = 362
2016/11/11 10:55:55     client:4        mycounter = 363
2016/11/11 10:55:55     client:3        mycounter = 364
2016/11/11 10:55:55     client:2        mycounter = 366
2016/11/11 10:55:55     client:1        mycounter = 367
2016/11/11 10:55:55     client:0        mycounter = 365
2016/11/11 10:55:55     client:4        mycounter = 368
2016/11/11 10:55:55     client:3        mycounter = 369
2016/11/11 10:55:55     client:1        mycounter = 371
2016/11/11 10:55:55     client:2        mycounter = 370
2016/11/11 10:55:55     client:0        mycounter = 372
2016/11/11 10:55:55     client:4        mycounter = 373
2016/11/11 10:55:55     client:3        mycounter = 374
2016/11/11 10:55:55     client:1        mycounter = 375
2016/11/11 10:55:55     client:2        mycounter = 376
2016/11/11 10:55:55     client:0        mycounter = 377
2016/11/11 10:55:55     client:4        mycounter = 378
2016/11/11 10:55:55     client:3        mycounter = 379
2016/11/11 10:55:55     client:1        mycounter = 380
2016/11/11 10:55:55     client:2        mycounter = 381
2016/11/11 10:55:55     client:0        mycounter = 382
2016/11/11 10:55:55     client:4        mycounter = 383
2016/11/11 10:55:55     client:3        mycounter = 384
2016/11/11 10:55:55     client:2        mycounter = 386
2016/11/11 10:55:55     client:1        mycounter = 385
2016/11/11 10:55:55     client:0        mycounter = 387
2016/11/11 10:55:55     client:4        mycounter = 388
2016/11/11 10:55:55     client:3        mycounter = 389
2016/11/11 10:55:55     client:2        mycounter = 390
2016/11/11 10:55:55     client:1        mycounter = 391
2016/11/11 10:55:55     client:0        mycounter = 392
2016/11/11 10:55:55     client:4        mycounter = 393
2016/11/11 10:55:55     client:3        mycounter = 394
2016/11/11 10:55:55     client:2        mycounter = 395
2016/11/11 10:55:55     client:1        mycounter = 396
2016/11/11 10:55:55     client:4        mycounter = 398
2016/11/11 10:55:55     client:0        mycounter = 397
2016/11/11 10:55:55     client:3        mycounter = 399
2016/11/11 10:55:55     client:2        mycounter = 400
2016/11/11 10:55:55     client:4        mycounter = 402
2016/11/11 10:55:55     client:1        mycounter = 401
2016/11/11 10:55:55     client:0        mycounter = 403
2016/11/11 10:55:55     client:3        mycounter = 404
2016/11/11 10:55:55     client:4        mycounter = 405
2016/11/11 10:55:55     client:2        mycounter = 406
2016/11/11 10:55:55     client:1        mycounter = 407
2016/11/11 10:55:55     client:0        mycounter = 408
2016/11/11 10:55:55     client:3        mycounter = 409
2016/11/11 10:55:55     client:4        mycounter = 410
2016/11/11 10:55:55     client:2        mycounter = 411
2016/11/11 10:55:55     client:1        mycounter = 412
2016/11/11 10:55:55     client:3        mycounter = 414
2016/11/11 10:55:55     client:0        mycounter = 413
2016/11/11 10:55:55     client:4        mycounter = 415
2016/11/11 10:55:55     client:2        mycounter = 416
2016/11/11 10:55:55     client:3        mycounter = 418
2016/11/11 10:55:55     client:1        mycounter = 417
2016/11/11 10:55:55     client:0        mycounter = 419
2016/11/11 10:55:55     client:4        mycounter = 420
2016/11/11 10:55:55     client:2        mycounter = 421
2016/11/11 10:55:55     client:3        mycounter = 422
2016/11/11 10:55:55     client:1        mycounter = 423
2016/11/11 10:55:55     client:0        mycounter = 424
2016/11/11 10:55:55     client:2        mycounter = 426
2016/11/11 10:55:55     client:4        mycounter = 425
2016/11/11 10:55:55     client:3        mycounter = 427
2016/11/11 10:55:55     client:1        mycounter = 428
2016/11/11 10:55:55     client:0        mycounter = 429
2016/11/11 10:55:55     client:2        mycounter = 430
2016/11/11 10:55:55     client:4        mycounter = 431
2016/11/11 10:55:55     client:3        mycounter = 432
2016/11/11 10:55:55     client:0        mycounter = 433
2016/11/11 10:55:55     client:2        mycounter = 435
2016/11/11 10:55:55     client:1        mycounter = 434
2016/11/11 10:55:55     client:4        mycounter = 436
2016/11/11 10:55:55     client:3        mycounter = 437
2016/11/11 10:55:55     client:0        mycounter = 438
2016/11/11 10:55:55     client:2        mycounter = 439
2016/11/11 10:55:55     client:1        mycounter = 440
2016/11/11 10:55:55     client:3        mycounter = 442
2016/11/11 10:55:55     client:0        mycounter = 443
2016/11/11 10:55:55     client:4        mycounter = 441
2016/11/11 10:55:55     client:2        mycounter = 444
2016/11/11 10:55:55     client:0        mycounter = 446
2016/11/11 10:55:55     client:1        mycounter = 445
2016/11/11 10:55:55     client:3        mycounter = 447
2016/11/11 10:55:55     client:4        mycounter = 448
2016/11/11 10:55:55     client:2        mycounter = 449
2016/11/11 10:55:55     client:1        mycounter = 450
2016/11/11 10:55:55     client:0        mycounter = 451
2016/11/11 10:55:55     client:3        mycounter = 452
2016/11/11 10:55:55     client:1        mycounter = 454
2016/11/11 10:55:55     client:4        mycounter = 453
2016/11/11 10:55:55     client:2        mycounter = 455
2016/11/11 10:55:55     client:0        mycounter = 456
2016/11/11 10:55:55     client:1        mycounter = 458
2016/11/11 10:55:55     client:3        mycounter = 457
2016/11/11 10:55:55     client:4        mycounter = 459
2016/11/11 10:55:55     client:2        mycounter = 460
2016/11/11 10:55:55     client:0        mycounter = 461
2016/11/11 10:55:55     client:3        mycounter = 463
2016/11/11 10:55:55     client:1        mycounter = 462
2016/11/11 10:55:55     client:4        mycounter = 464
2016/11/11 10:55:55     client:2        mycounter = 465
2016/11/11 10:55:55     client:0        mycounter = 466
2016/11/11 10:55:55     client:3        mycounter = 467
2016/11/11 10:55:55     client:1        mycounter = 468
2016/11/11 10:55:55     client:2        mycounter = 469
2016/11/11 10:55:55     client:4        mycounter = 470
2016/11/11 10:55:55     client:0        mycounter = 471
2016/11/11 10:55:55     client:3        mycounter = 472
2016/11/11 10:55:55     client:2        mycounter = 473
2016/11/11 10:55:55 Finished client:2
2016/11/11 10:55:55     client:1        mycounter = 474
2016/11/11 10:55:55     client:4        mycounter = 475
2016/11/11 10:55:55     client:0        mycounter = 476
2016/11/11 10:55:55     client:3        mycounter = 477
2016/11/11 10:55:55     client:4        mycounter = 478
2016/11/11 10:55:55     client:1        mycounter = 479
2016/11/11 10:55:55     client:0        mycounter = 480
2016/11/11 10:55:55     client:4        mycounter = 481
2016/11/11 10:55:55     client:3        mycounter = 482
2016/11/11 10:55:55     client:1        mycounter = 483
2016/11/11 10:55:55     client:0        mycounter = 484
2016/11/11 10:55:55     client:4        mycounter = 485
2016/11/11 10:55:55     client:3        mycounter = 486
2016/11/11 10:55:55     client:1        mycounter = 487
2016/11/11 10:55:55     client:0        mycounter = 488
2016/11/11 10:55:55     client:1        mycounter = 491
2016/11/11 10:55:55 Finished client:1
2016/11/11 10:55:55     client:3        mycounter = 490
2016/11/11 10:55:55     client:4        mycounter = 489
2016/11/11 10:55:55     client:0        mycounter = 492
2016/11/11 10:55:55     client:3        mycounter = 493
2016/11/11 10:55:55     client:4        mycounter = 494
2016/11/11 10:55:55 Finished client:4
2016/11/11 10:55:55     client:0        mycounter = 495
2016/11/11 10:55:55     client:3        mycounter = 496
2016/11/11 10:55:55     client:0        mycounter = 497
2016/11/11 10:55:55     client:3        mycounter = 498
2016/11/11 10:55:55     client:0        mycounter = 499
2016/11/11 10:55:55 Finished client:0
2016/11/11 10:55:55     client:3        mycounter = 500
2016/11/11 10:55:55 Finished client:3
2016/11/11 10:55:55 Finished.

It works! Now you can create an API service for providing counters running on App Engine. You may need to use a flexible environment since go client library is not supported on the standard environment at the time of writing.

Disclaimer: All code snippets are released under Apache 2.0 License. This is not an official Google product.

Cloud DatastoreのEventual Consistencyに関するメモ(google.cloudクライアント編)

何の話かというと

Cloud Datastoreに対するQueryは、"Ancestor Query" を使用する事でStrong Consistencyが保証されます。逆に Ancestor Query を使用しなかった場合にどのような現象が発生するのかを雑多にメモしておきます。ここでは、GCEのVMからDatastoreにアクセスする前提で、google.cloudクライアントを使用します。(現在、ndbクライアントはGAE限定なので。)

Ancestor Queryとは?

Cloud Datastoreに格納するEntityは、それぞれに「親」Entityを指定することで、ツリー状のグループを構成します。1つのツリーを「Entity Group」と呼びます。Ancestor Queryは、Queryの検索条件として「Ancestor Key」を指定して、検索範囲をその下にぶら下がったEntityに限定することを言います。(親Entityは、自分と同じKindである必要はありませんので、Entity Gorupには、さまざまなKindのEntityが入り交じる点に注意してください。)

一方、Ancestor Keyを指定しない場合は、Datastoreに格納されたすべてのEntityが検索対象となります。これを「Global Query」と呼びます。

プロパティによるGlobal Query

まず次のスクリプトで1秒ごとにEntityを生成します。EntityのID(Key name string)の他に、'myid' というプロパティに同じIDの文字列を格納します。プロパティ 'timestamp' にはEntityを生成したタイムスタンプを入れておきます。

writer.py

#!/usr/bin/python

from google.cloud import datastore
import time, datetime, os, uuid

project_id = os.environ.get('PROJECT_ID')
ds = datastore.Client(project_id)

os.remove('/tmp/tmp0')
for i in range(1000):
    myid = str(uuid.uuid4())
    key = ds.key('Kind01', myid)
    ent = datastore.Entity(key)
    ts = datetime.datetime.now()
    ent.update({'timestamp': ts, 'myid': myid})
    ds.put(ent)
    with open('/tmp/tmp0', 'a') as file:
        file.write('%s\n' % myid)
    print ts, myid 
    time.sleep(1)

実行するとこんな感じ・・・

$ ./writer.py 
2016-10-26 00:50:27.589180 36a1fca7-a4f8-4d74-b2e0-5a8521f68cff
2016-10-26 00:50:28.749832 e1cb00a0-9ccc-4558-b7d6-2bb41b10be56
2016-10-26 00:50:29.866750 8245581d-2b0c-4888-90e4-710f683837e9
2016-10-26 00:50:30.950333 b1058f07-3c58-4ad1-8ea1-c4191be2bcbe
2016-10-26 00:50:32.061670 9874503a-7318-4d96-a0ca-78668f736205
2016-10-26 00:50:33.229060 be9fd3f0-d2e6-4e53-a748-4d26b766e7d8
...

この時、ファイル /tmp/tmp0 にEntityのIDが追記されていきます。

同時に次のスクリプトを実行して、/tmp/tmp0に書き込まれたIDを用いて、'myid' = ID という条件でQuery(Global Query)を実行します。

reader.py

#!/usr/bin/python

from google.cloud import datastore
import time, datetime, os
import subprocess

project_id = os.environ.get('PROJECT_ID')
ds = datastore.Client(project_id)

f = subprocess.Popen(['tail','-F','/tmp/tmp0'],
                     stdout=subprocess.PIPE,stderr=subprocess.PIPE)

while True:
    myid = f.stdout.readline().strip()
    query = ds.query(kind='Kind01')
    query.add_filter('myid', '=', myid)
    ts = datetime.datetime.now()
    print ts, 'Trying to find ', myid
    while True:
        iter = query.fetch()
        ent, _, _ = iter.next_page()
        if len(ent) == 0:
          print 'Failed and retry...'
          continue
        ts = datetime.datetime.now()
        print ts, 'Succeeded.'
        print ent[0]['timestamp'], myid
        break

実行するとこんな感じ・・・

2016-10-26 00:50:27.748414 Trying to find  36a1fca7-a4f8-4d74-b2e0-5a8521f68cff
Failed and retry...
Failed and retry...
Failed and retry...
Failed and retry...
2016-10-26 00:50:27.948764 Succeeded.
2016-10-26 00:50:27.589180+00:00 36a1fca7-a4f8-4d74-b2e0-5a8521f68cff
2016-10-26 00:50:28.865377 Trying to find  e1cb00a0-9ccc-4558-b7d6-2bb41b10be56
Failed and retry...
Failed and retry...
Failed and retry...
Failed and retry...
2016-10-26 00:50:29.066650 Succeeded.
2016-10-26 00:50:28.749832+00:00 e1cb00a0-9ccc-4558-b7d6-2bb41b10be56
2016-10-26 00:50:29.948932 Trying to find  8245581d-2b0c-4888-90e4-710f683837e9
Failed and retry...
2016-10-26 00:50:30.004495 Succeeded.
2016-10-26 00:50:29.866750+00:00 8245581d-2b0c-4888-90e4-710f683837e9
2016-10-26 00:50:31.060321 Trying to find  b1058f07-3c58-4ad1-8ea1-c4191be2bcbe
2016-10-26 00:50:31.177282 Succeeded.
2016-10-26 00:50:30.950333+00:00 b1058f07-3c58-4ad1-8ea1-c4191be2bcbe

タイミングによって、Queryに失敗していることがわかります。これは、Global Queryでは、作成直後のEntityが発見されない可能性があることを示しています。(この例では、1秒以内には発見されています。)

(ちなみに、Ancestor Queryだとなぜこのような事が起こらないかというと、Datastoreの実装として、Ancestor Queryが発行された場合は、該当のEntity Groupに対して内部的にデータ同期状態のチェックが入るようになっているからです。Global Queryの場合、検索対象がすべてのEntityになるため、データ同期状態のチェックはコストが高すぎて実行できません。そのため、Eventual Consistencyな検索となります。)

ID指定によるEntityの取得

次のスクリプトは、プロパティではなく、明示的に ID を指定してEntityを取得します。

reader2.py

#!/usr/bin/python

from google.cloud import datastore
import time, datetime, os
import subprocess

project_id = os.environ.get('PROJECT_ID')
ds = datastore.Client(project_id)

f = subprocess.Popen(['tail','-F','/tmp/tmp0'],
                     stdout=subprocess.PIPE,stderr=subprocess.PIPE)

while True:
    myid = f.stdout.readline().strip()
    key = ds.key('Kind01', myid)
    ts = datetime.datetime.now()
    print ts, 'Trying to find ', myid
    while True:
        ent = ds.get(key)
        if not ent:
            print 'Failed and retry...'
            continue
        ts = datetime.datetime.now()
        print ts, 'Succeeded.'
        print ent['timestamp'], myid
        break

こちらを実行すると、次のようになります。

$ ./reader2.py
2016-10-26 00:59:28.266182 Trying to find  2d4ad50f-488c-466d-8273-3533c95de6ed
2016-10-26 00:59:28.378003 Succeeded.
2016-10-26 00:59:28.065682+00:00 2d4ad50f-488c-466d-8273-3533c95de6ed
2016-10-26 00:59:29.422306 Trying to find  1e1b6e94-0697-4a8f-9b2c-96696085e429
2016-10-26 00:59:29.560694 Succeeded.
2016-10-26 00:59:29.267432+00:00 1e1b6e94-0697-4a8f-9b2c-96696085e429
2016-10-26 00:59:30.570010 Trying to find  a159836f-cdf5-4148-a305-9b2f01e8f7d0
2016-10-26 00:59:30.679568 Succeeded.
2016-10-26 00:59:30.423487+00:00 a159836f-cdf5-4148-a305-9b2f01e8f7d0
2016-10-26 00:59:31.701340 Trying to find  add815ed-7cb9-4e61-8377-38eca8dac14d
2016-10-26 00:59:31.817533 Succeeded.
2016-10-26 00:59:31.571264+00:00 add815ed-7cb9-4e61-8377-38eca8dac14d
2016-10-26 00:59:32.844253 Trying to find  9ce71c69-d5e0-48c1-a065-ec41aad386d3
2016-10-26 00:59:32.919830 Succeeded.
2016-10-26 00:59:32.702581+00:00 9ce71c69-d5e0-48c1-a065-ec41aad386d3
2016-10-26 00:59:33.908359 Trying to find  5c250b18-4b03-45c7-bbc4-c8a703737148
2016-10-26 00:59:33.926057 Succeeded.
2016-10-26 00:59:33.845533+00:00 5c250b18-4b03-45c7-bbc4-c8a703737148

この場合は、Strong Consistencyとなり、かならず、Entityの最新の内容が取得されます。

Ancestor Queryの場合

Entityグループを構成して、親Entityを指定したAncestor Queryを実施する場合は、作成直後のEntityも必ず取得されるはずですが、念のために確認しておきましょう。

ここでは、次のようなEntityグループを構成します。

Kind00: 'grandpa'
    |
    |
Kind00: 'father'
    |
    ---------------------
    |                |
Kind01: uuid     Kind01: uuid ・・・

Entityを作成するスクリプトは、こんな感じ。(Kind00: 'grandpa' と Kind00: 'father' に対応するEntityは実際には作成していませんが、これは問題ありません。「Parent Key」は実際のところ、Entityグループのローカルインデックスの検索Keyとして使われるだけですので・・・)

writer3.py

#!/usr/bin/python

from google.cloud import datastore
import time, datetime, os, uuid

project_id = os.environ.get('PROJECT_ID')
ds = datastore.Client(project_id)

os.remove('/tmp/tmp0')

for i in range(1000):
    myid = str(uuid.uuid4())
    parent_key = ds.key('Kind00', 'grandpa', 'Kind00', 'father')
    key = ds.key('Kind01', myid, parent=parent_key)
    ent = datastore.Entity(key)
    ts = datetime.datetime.now()
    ent.update({'timestamp': ts, 'myid': myid})
    ds.put(ent)
    with open('/tmp/tmp0', 'a') as file:
        file.write('%s\n' % myid)
    print ts, myid 
    time.sleep(1)

Entityを検索する方はこんな感じ。ここでは、直近の親 ds.key('Kind00', 'grandpa', 'Kind00', 'father') を検索の起点としていますが、もちろんその上の ds.key('Kind00', 'grandpa') を起点にしても構いません。

#!/usr/bin/python

from google.cloud import datastore
import time, datetime, os
import subprocess

project_id = os.environ.get('PROJECT_ID')
ds = datastore.Client(project_id)

f = subprocess.Popen(['tail','-F','/tmp/tmp0'],
                     stdout=subprocess.PIPE,stderr=subprocess.PIPE)

while True:
    myid = f.stdout.readline().strip()
    ancestor_key = ds.key('Kind00', 'grandpa', 'Kind00', 'father')
    query = ds.query(kind='Kind01', ancestor=ancestor_key)
    query.add_filter('myid', '=', myid)
    ts = datetime.datetime.now()
    print ts, 'Trying to find ', myid
    while True:
        iter = query.fetch()
        ent, _, _ = iter.next_page()
        if len(ent) == 0:
          print 'Faild and retry...'
          continue
        ts = datetime.datetime.now()
        print ts, 'Succeeded.'
        print ent[0]['timestamp'], myid
        break

実行結果は次の通りで、予想通り、取りこぼしはありません。

$ ./reader3.py 
2016-11-02 07:21:02.236359 Trying to find  6b77f8da-0775-4f77-980e-14c5c59a451f
2016-11-02 07:21:02.392515 Succeeded.
2016-11-02 07:21:01.482085+00:00 6b77f8da-0775-4f77-980e-14c5c59a451f
2016-11-02 07:21:02.634073 Trying to find  3681841d-72e1-43ac-9d91-0dfcbf137713
2016-11-02 07:21:02.662415 Succeeded.
2016-11-02 07:21:02.582529+00:00 3681841d-72e1-43ac-9d91-0dfcbf137713
2016-11-02 07:21:03.682064 Trying to find  164ff921-bf65-4835-9a3b-6c0a5e7948d4
2016-11-02 07:21:03.719449 Succeeded.
2016-11-02 07:21:03.635356+00:00 164ff921-bf65-4835-9a3b-6c0a5e7948d4
2016-11-02 07:21:04.777847 Trying to find  eeb6702b-2f4b-4254-8957-1db2ae52a23e
2016-11-02 07:21:04.805840 Succeeded.
2016-11-02 07:21:04.683344+00:00 eeb6702b-2f4b-4254-8957-1db2ae52a23e
2016-11-02 07:21:05.852476 Trying to find  a514fe27-37df-4639-b269-5192b926624b
2016-11-02 07:21:05.946797 Succeeded.

Cloud ML Super Quick Tour

cloud.google.com

Background

Google Cloud ML is now available as a Beta release (as of 2016/10/11). Super simply stated, you can do the following things using Cloud ML.

 (1) Train your custom TensorFlow models on GCP.
 (2) Serve prediction API with your custom models.

Regarding the custom model training, you can use useful features such as hyper-parameter tuning and distributed training, but in this post, I will show you the minimum steps to migrate your existing TensorFlow models to Cloud ML. As an example, I will use the following code. It classifies the MNIST dataset with a single layer neural network.

MNIST single layer network.ipynb

Modification to the existing code

First, you have to create a library by putting all files in a single directory. If you have a single executable file 'task.py', your library directory is something like this:

trainer/
├── __init__.py   # Empty file
└── task.py       # Executable file

The name of library directory and executable file can be arbitrary.

Then you will add the following code at the end of the executable file:

if __name__ == '__main__':
    tf.app.run()

The run() method at the end implicitly calls the main() function. And you need to use Cloud Storage to exchange files with the runtime environment. It can be done by specifying the Cloud Storage URI "gs://..." for file paths in your code. Considering the use case where you test the code on your local machine before submitting it to Cloud ML, you'd better make your code such that you can specify the file paths through command line options. The followings are the typical directories you need to consider:

  • Directory to store checkpoint files during the training.
  • Directory to store the trained model binary (The filename should be 'export'.)
  • Directory to store log data for TensorBoard.
  • Directory to store training data.

Note that you don't necessarily have to use Cloud Storage for the training data. You can use other data sources such as Cloud Dataflow as training data.

In this example, I will make the entry point of my code 'task.py' like this:

def main(_):
    parser = argparse.ArgumentParser()
    parser.add_argument('--train_dir', type=str, default='/tmp/train')  # Checkpoint file
    parser.add_argument('--model_dir', type=str, default='/tmp/model')  # Model file
    parser.add_argument('--train_step', type=int, default=2000)         # Training steps
    args, _ = parser.parse_known_args()
    run_training(args)

if __name__ == '__main__':
    tf.app.run()

This enables users to specify directories for checkpoint files and model binary with the command line options '--train_dir' and '--model_dir'. In addition, the users can specify the number of training iterations with '--train_step'. In this example, the training data is directly fetched from the Internet using the TensorFlow library.

In addition, as a particular point in Cloud ML, you have to specify the input/output objects for the prediction API service using the Collection object of TensorFlow. Collection is a generic object to store arbitrary key-value style data. In Cloud ML, you store Placeholders as API inputs with the key 'inputs', and store prediction value objects as API outputs with the key 'outputs' like this:

input_key = tf.placeholder(tf.int64, [None,])
x = tf.placeholder(tf.float32, [None, 784])

inputs = {'key': input_key.name, 'image': x.name}
tf.add_to_collection('inputs', json.dumps(inputs))

p = tf.nn.softmax(tf.matmul(hidden1, w0) + b0)
output_key = tf.identity(input_key)

outputs = {'key': output_key.name, 'scores': p.name}
tf.add_to_collection('outputs', json.dumps(outputs))

More precisely, you create dictionaries containing the name attributes of input/output objects and store JSON serialization of them in the Collection object using the tf.add_to_collection() method. The keys in the dictionaries are used as the name attributes in the API request/response. In this case, in addition to the input image 'x' and the prediction result 'p' (list of probabilities for each category), 'input_key' and 'output_key' are included in the input/output objects. The 'output_key' simply returns the same value as the 'input_key'. When you send multiple entries to the prediction API, you can match the entries in the response using these key values.

That's all. The following is the modified code considering what I have explained so far:

task.py

import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
import argparse, os, json
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

def run_training(args):
    # Define filepath for checkpoint and final model
    checkpoint_path = os.path.join(args.train_dir, 'checkpoint')
    model_path = os.path.join(args.model_dir, 'export') # Filename should be 'export'.
    num_units = 1024
    
    x = tf.placeholder(tf.float32, [None, 784])
    
    w1 = tf.Variable(tf.truncated_normal([784, num_units]))
    b1 = tf.Variable(tf.zeros([num_units]))
    hidden1 = tf.nn.relu(tf.matmul(x, w1) + b1)
    
    w0 = tf.Variable(tf.zeros([num_units, 10]))
    b0 = tf.Variable(tf.zeros([10]))
    p = tf.nn.softmax(tf.matmul(hidden1, w0) + b0)
    
    t = tf.placeholder(tf.float32, [None, 10])
    loss = -tf.reduce_sum(t * tf.log(p))
    train_step = tf.train.AdamOptimizer().minimize(loss)
    correct_prediction = tf.equal(tf.argmax(p, 1), tf.argmax(t, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    # Define key element
    input_key = tf.placeholder(tf.int64, [None,], name='key')
    output_key = tf.identity(input_key)

    # Define API inputs/outpus object
    inputs = {'key': input_key.name, 'image': x.name}
    outputs = {'key': output_key.name, 'scores': p.name}
    tf.add_to_collection('inputs', json.dumps(inputs))
    tf.add_to_collection('outputs', json.dumps(outputs))
    
    saver = tf.train.Saver()
    sess = tf.InteractiveSession()
    sess.run(tf.initialize_all_variables())

    i = 0
    for _ in range(args.train_step):
        i += 1
        batch_xs, batch_ts = mnist.train.next_batch(100)
        sess.run(train_step, feed_dict={x: batch_xs, t: batch_ts})
        if i % 100 == 0:
            loss_val, acc_val = sess.run([loss, accuracy],
                feed_dict={x:mnist.test.images, t: mnist.test.labels})
            print ('Step: %d, Loss: %f, Accuracy: %f'
                   % (i, loss_val, acc_val))
            saver.save(sess, checkpoint_path, global_step=i)

    # Export the final model.
    saver.save(sess, model_path)


def main(_):
    parser = argparse.ArgumentParser()
    parser.add_argument('--train_dir', type=str, default='/tmp/train')  # Checkpoint directory
    parser.add_argument('--model_dir', type=str, default='/tmp/model')  # Model directory
    parser.add_argument('--train_step', type=int, default=2000)         # Training steps
    args, _ = parser.parse_known_args()
    run_training(args)


if __name__ == '__main__':
    tf.app.run()

Running the code on Cloud ML

To submit a job to Cloud ML, you need a local machine with Cloud ML SDK, or you can use Cloud Shell as a local environment. I will use Cloud Shell here. Please refer to the official document for other environments.

At first, you create a new project and enable Cloud ML API through the API Manager. Then you launch Cloud Shell and install the SDK.

$ curl https://storage.googleapis.com/cloud-ml/scripts/setup_cloud_shell.sh | bash
$ export PATH=${HOME}/.local/bin:${PATH}
$ curl https://storage.googleapis.com/cloud-ml/scripts/check_environment.py | python
Success! Your environment is configured correctly.

The following command sets the 'editor' authority of the project to a service account. This is necessary to submit jobs using the service account.

$ gcloud beta ml init-project

Prepare the TensorFlow codes (which I explained in the previous section) in the 'trainer' directory under you home directory.

trainer/
├── __init__.py   # Empty file
└── task.py       # Executable file

Before submitting a job, try to run the code on the local environment with a small number of iterations to see there's no obvious mistakes.

$ mkdir -p /tmp/train /tmp/model
$ cd $HOME
$ python -m trainer.task --train_step=200
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Step: 100, Loss: 3183.995850, Accuracy: 0.903500
Step: 200, Loss: 2237.709229, Accuracy: 0.934500

$ ls -l /tmp/train /tmp/model/
/tmp/model/:
total 9584
-rw-r--r-- 1 enakai enakai     203 Oct  5 17:14 checkpoint
-rw-r--r-- 1 enakai enakai 9770436 Oct  5 17:14 export
-rw-r--r-- 1 enakai enakai   35514 Oct  5 17:14 export.meta
/tmp/train:
total 28744
-rw-r--r-- 1 enakai enakai     163 Oct  5 17:14 checkpoint
-rw-r--r-- 1 enakai enakai 9770436 Oct  5 17:14 checkpoint-100
-rw-r--r-- 1 enakai enakai   35514 Oct  5 17:14 checkpoint-100.meta
-rw-r--r-- 1 enakai enakai 9770436 Oct  5 17:14 checkpoint-200
-rw-r--r-- 1 enakai enakai   35514 Oct  5 17:14 checkpoint-200.meta

Looks good. Now let's run the code on the cloud. First, you create a Cloud Storage bucket to store data. The bucket name can be arbitrary, but you'd better include the project name following the convention.

$ PROJECT_ID=project01 # your project ID
$ TRAIN_BUCKET="gs://$PROJECT_ID-mldata"
$ gsutil mkdir $TRAIN_BUCKET

Decide the job name ('job01' in this example), and submit it to Cloud ML.

$ JOB_NAME="job01"
$ touch .dummy
$ gsutil cp .dummy $TRAIN_BUCKET/$JOB_NAME/train/
$ gsutil cp .dummy $TRAIN_BUCKET/$JOB_NAME/model/
$ gcloud beta ml jobs submit training $JOB_NAME \
  --region=us-central1 \
  --package-path=trainer --module-name=trainer.task \
  --staging-bucket=$TRAIN_BUCKET \
  -- \
  --train_dir="$TRAIN_BUCKET/$JOB_NAME/train" \
  --model_dir="$TRAIN_BUCKET/$JOB_NAME/model"

createTime: '2016-10-05T08:53:35Z'
jobId: job01
state: QUEUED
trainingInput:
  args:
  - --train_dir=gs://project01/job01/train
  - --model_dir=gs://project01/job01/model
  packageUris:
  - gs://project01/cloudmldist/1475657612/trainer-0.0.0.tar.gz
  pythonModule: trainer.task
  region: us-central1

Folder 'cloudmldist' will be created under the bucket specified with '--staging-bucket', and your codes will be placed under it. Then Cloud ML starts the execution of the code. In the steps above, you explicitly create folders to store checkpoint files and model binary with the gsutil command. You can automate it in your code if you prefer.

Monitor the job execution with the following command:

$ watch -n1 gcloud beta ml jobs describe --project $PROJECT_ID $JOB_NAME
createTime: '2016-10-05T08:53:35Z'
jobId: job01
startTime: '2016-10-05T08:53:45Z'
state: RUNNING
trainingInput:
  args:
  - --train_dir=gs://project01/job01/train
  - --model_dir=gs://project01/job01/model
  packageUris:
  - gs://project01/cloudmldist/1475657612/trainer-0.0.0.tar.gz
  pythonModule: trainer.task
  region: us-central1

The 'state' becomes 'SUCCEEDED' when the job has been completed. You can see the stdout/stderr logs on the Stackdriver's log management console by selecting the 'Cloud Machine Learning' log.

On successful job completion, the model binary 'export' is created as below:

$ gsutil ls $TRAIN_BUCKET/$JOB_NAME/model/export*
gs://project01/job01/model/export
gs://project01/job01/model/export.meta

Serving you model with the prediction API

Now you can start the prediction API service using the trained model binary 'export' by executing the following commands:

$ MODEL_NAME="MNIST"
$ gcloud beta ml models create $MODEL_NAME
$ gcloud beta ml versions create \
  --origin=$TRAIN_BUCKET/$JOB_NAME/model --model=$MODEL_NAME v1
$ gcloud beta ml versions set-default --model=$MODEL_NAME v1

You specify the model name with the environment variable 'MODEL_NAME'. And you can manage multiple versions of the model. In this case, you created a service with 'v1' version model, and made it the default version.

You need to wait for a few minutes until the service becomes available. So while this time, let's create a test dataset with the following python script:

import json
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
with open("data.json", "w") as file:
    for i in range(10):
        data = {"image": mnist.test.images[i].tolist(), "key": i}
        file.write(json.dumps(data)+'\n')

It generates a JSON file 'data.json' containing a pair of image and key value per line. You can submit the date to the prediction API with the following command:

$ gcloud beta ml predict --model=${MODEL_NAME} --json-instances=data.json
predictions:
- key: 0
  scores:
  - 2.53733e-08
  - 6.47722e-09
  - 2.23573e-06
  - 5.32844e-05
  - 3.08012e-10
  - 1.33022e-09
  - 1.55983e-11
  - 0.99991
  - 4.39428e-07
  - 3.38841e-05
- key: 1
  scores:
  - 1.98303e-08
  - 2.84799e-07
  - 0.999985
  - 1.47131e-05
  - 1.45546e-13
  - 1.90945e-09
  - 3.50033e-09
  - 2.24941e-18
  - 2.60025e-07
  - 1.45738e-14
- key: 2
  scores:
  - 3.63027e-09
...

You can see the response on the command line. Please refer to the official document for URLs to directly submit REST requests.

Note on the distributed training

In this example, I used the sample code using the low level TensorFlow APIs. So you need additional modifications to the code following the Distributed TensorFlow if you want to distribute the training jobs on Cloud ML. It's not a trivial change, unfortunately. Some basic points are explained in the following article.

enakai00.hatenablog.com

But don't worry. The TensorFlow team is planning to provide high level TensorFlow APIs so that you can write TensorFlow codes automatically executed in a distributed manner on Cloud ML.

Stay tuned!

Disclaimer: All code snippets are released under Apache 2.0 License. This is not an official Google product.