At the moment, users can build SQLFlow into a gRPC server or a command line binary. Both forms statically link to the SQLFlow core code and inherit all dependencies, including TensorFlow, various database client libraries, and various SQL dialect parsers in the form of gRPC backend servers. The many dependencies require us to release and run both forms in Docker containers.
For convenient deployment, we want to run the SQLFlow server in containers, and end users access the server via the command-line client. In particular, we want it easy to install the client tool, so we need to prune the dependencies. As a solution, we make the client tool a statically linked Go binary that remotely calls the SQLFlow server, so users can just download and run it.
The command-line client tool, named
sqlflow, is complementary to other forms of clients, like Jupyter Notebook. Users can write shell or Python scripts calling the client tool to realize complex applications.
Currently, the command-line binary form of SQLFlow has the name
repl, which is the name of a user interface design philosophy. Many command-line tools, including mysql and python, implement the UI of REPL. Let’s follow the convention of
mysqld to name the new client
repl’s only dependency is the
it2check bash script that determines whether to call
sixel to render images. We’ll find a way to show images later. At the moment, we simply remove this dependency.
We’ll provide installation guides on Mac and Windows to install
sqlflow swiftly. For example:
- Mac/Linux users can run the following command in a terminal to use
wget https://raw.githubusercontent.com/sql-machine-learning/.../sqlflow && chmod +x ./sqlflow
- PC users can just click the download link to use
Fortunately, there’are only a few places in the REPL codebase that requires SQLFlow core.
At the moment, SQLFlow workflow calls
repl to execute a single step, after the refactoring, as
repl has become a pure client, the workflow code should call the
step binary instead.
Consequently, we can delete code in
repl about workflow safely:
We don’t need the flag
--model_dirany more because the training process is running in
We’ll define a new flag
--sqlflow_serverthat holds the
We still use
godotenv as well as environment variables to hold flags that change infrequently. The two necessary config entries are:
SQLFLOW_DATASOURCEthe alternative of
--datasource. We already have this.
SQLFLOW_SERVERthe alternative of
--sqlflow_server. This is to be added.
We have to replace the call to
step.RunSQLProgramAndPrintResult with a new function
RunSQLProgram that would be defined in the
sqlflow codebase. The new function should:
Construct a client stub from
Use the stub to send the SQL statement get from
ReadStmtto the server.
- Parse the result returned from the SQLFlow server.
- The server side returns both
PlotilleASCII figures and
PNGs to the client. It up to the client code to determine how to display the figures. See https://github.com/sql-machine-learning/sqlflow/blob/be7c5728f47e8d3b81893c1d712974a6ddcd5f1c/pkg/step/step.go#L69-L85
- The server side returns both
- Both the client construction code and result parsing code can be implemented by referring to the existing code in
The original implementation of auto-complete is based on the
pkg/sql/attribute package, in a pure client as
sqlflow, we can only depend on such packages partly because the client may have a different version with the server-side.
To solve this problem, the server protocol should implement a new RPC method to pass the auto-completion dictionaries. The
main function of
sqlflow would call this new method to get the dictionaries.
- The dictionaries for stable models still link the
attribute.PremadeModelParamsDocsfor canned estimators and
attribute.OptimizerParamsDocsfor TensorFlow optimizers and XGBoost objectives
- The dictionaries for volatile model packages will be defined in a repeated field in the response message.
- Models from
- Models from SQLFlow Model Zoo.
- This would be considered later.
- Models from
We don’t have an authentication mechanism now. Details in this part are omitted and should be supplied by future studies.
There’re two problems that should be considered seriously:
Leave an interface for SSO that’s accepted by most companies/organizations.
How to safely access SQLFlow service deployed in internal networks.