Submitter

A submitter is a pluggable module in SQLFlow that is used to submit an ML job to a third party computation service.

Workflow

When a user types in an extended SQL statement, SQLFlow first parses and semantically verifies the statement. Then SQLFlow either runs the ML job locally or submits the ML job to a third party computation service.

In the latter case, SQLFlow produces a job description (TrainDescription or PredictDescription) and hands it over to the submitter. For a training SQL, SQLFlow produces TrainDescription; for prediction SQL, SQLFlow produces PredDescription. The concrete definition of the description looks like the following

type ColumnType struct {
    Name             string // e.g. sepal_length
    DatabaseTypeName string // e.g. FLOAT
}

// SELECT *
// FROM iris.train
// TRAIN DNNClassifier
// WITH
//   n_classes = 3,
//   hidden_units = [10, 20]
// COLUMN sepal_length, sepal_width, petal_length, petal_width
// LABEL class
// INTO sqlflow_models.my_dnn_model;
type TrainDescription struct {
    StandardSelect string       // e.g. SELECT * FROM iris.train
    Estimator      string       // e.g. DNNClassifier
    Attrs          map[string]string // e.g. "n_classes": "3", "hidden_units": "[10, 20]"
    X              []ColumnType // e.g. "sepal_length": "FLOAT", ...
    Y              ColumnType   // e.g. "class": "INT"
    ModelName      string       // e.g. my_dnn_model
}

// SELECT *
// FROM iris.test
// PREDICT iris.predict.class
// USING sqlflow_models.my_dnn_model;
type PredDescription struct {
    StandardSelect string // e.g. SELECT * FROM iris.test
    TableName      string // e.g. iris.predict
    ModelName      string // e.g. my_dnn_model
}

Submitter Interface

The submitter interface should provide two functions Train and Predict. The detailed definition can be the following

type Submitter interface {
    // Train executes a ML training job and streams job's response through writer.
    // A typical Train function should include
    // - Loading the training data
    // - Initializing the model
    // - model.train
    // - Saving the trained model to a persistent storage
    Train(desc TrainDescription, writer PipeWriter) error
    // Predict executes a ML predicting job and streams job's response through writer
    // A typical Predict function should include
    // - Loading the model from a persistent storage
    // - Loading the prediction data
    // - model.predict
    // - Writing the prediction result to a table
    Predict(desc PredictDescription, writer PipeWriter) error
}

Register a submitter

A new submitter can be added as

import (
    ".../my_submitter"
    ".../sqlflow/sql"
)

func main() {
    // ...
    sql.Register(my_submitter.NewSubmitter())
    // ...
    for {
    	sql := recv()
    	sql.Run(sql)
    }
}

where sql.Register will put my_submitter instance to package level registry. During sql.Run, it will check whether there is a submitter registered. If there is, sql.Run will run either submitter.Train or submitter.Predict.