Link

Running SQLFlow on Kubernetes

This is a tutorial on how to run SQLFlow on Kubernetes, and this tutorial will deploy:

  • A MySQL server instance with some example data loaded,
  • The SQLFlow gRPC server, and
  • The Jupyter Notebook server with SQLFlow magic command installed.
  • The JupyterHub which can serve multiple Notebook server for various users.

There are two sections in this tutorial:

Prerequisites

  1. Setup a Kubernetes cluster: You can refer to the official page to set up a full cluster or use a local quick start tool: minikube This tutorial would use [minikube] to demonstrate the SQLFlow.
  2. Install kubectl, which is the command line tool to interact with the Kubernetes cluster.
  3. Make sure the Kubernetes nodes can pull the official SQLFlow Docker image [sqlflow/sqlflow] or your custom Docker image.

Deploy the All-in-One SQLFlow

  1. Deploy the SQLFlow Pod on Kubernetes
     > kubectl create -f k8s/sqlflow-mysql.yaml
    

    The above command deploys a Pod, a MySQL server instance, a SQLFlow gRPC server and the Jupyter Notebook server runs in this Pod. You can also use your custom Docker image by editting the image field of the yaml file: k8s/sqlflow-all-in-one.yaml

     spec:
         ...
         containers:
         - image: <your repo name>/sqlflow
    
  2. Testing your SQLFlow setup You can find a Pod on Kubernetes which name is sqlflow-all-in-one-<POD-ID>:
     > kubectl get pods
     NAME    READY   STATUS    RESTARTS   AGE
     NAME                             READY   STATUS    RESTARTS   AGE
     sqlflow-all-in-one-9b57566c9-8xkpk   1/1     Running   0          60s
    

Running Your Query in SQLFlow

  1. Copy the node IP of the sqlflow Pod on minikube as the follows command:
     > minikube ip
     192.168.99.100
    

    NOTE: If you are using a real cluster, you can find the node domain/IP from the NODE column usingkubectl get pods -o wide:

     > kubectl get pods -o wide
     NAME                                 READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
     sqlflow-all-in-one-9b57566c9-8xkpk   1/1     Running   0          24s     172.17.0.9   minikube   <none>           <none>
    
  2. Open a web browser and go to ‘:8888', you can find the [SQLFlow example](/sqlflow/doc/tutorial/iris-dnn/) in the Jupyter notebook file lists.

Deploy the SQLFlow Hub

This section will deploys SQLFlow Hub using JupyterHub to serve Jupyter notebook for multiple users, and easy to scale up/down the SQLFlow gRPC server according to workload.

  1. Build the SQLFlow Hub Docker image and push to a registry server that the Kubernetes Note can access it.
     $ cd k8s/sqlflowhub
     $ docker build -t <your-repo>/sqlflowhub .
     $ docker push <your-repo>/sqlflowhub
    
  2. Deploy the MySQL, SQLFlow gRPC server and JupyterHub step by step:

     kubectl create -f k8s/sqlflow-mysql.yaml
     kubectl create -f k8s/sqlflow-server.yaml
     kubectl create -f k8s/sqlflow-jhub.yaml
    

    NOTE: Should grant all the remote hosts can access to the MySQL server if you want to use the custorm MySQL Docker image, the grant command like:

     GRANT ALL PRIVILEGES ON *.* TO 'root'@'' IDENTIFIED BY 'root' WITH GRANT OPTION;
    
  3. Check the SQLFlow Pods, you can find:
    • A MySQL Pod named sqlflow-mysql-*.
    • A JupyterHub Pod named sqlflow-jhub-*.
    • 3 SQLFlow gRPC server Pods named sqlflow-server-*, and it’s easy to scale up/down the replica count by modifying the replicas field of the yaml file: k8s/sqlflow-server.yaml.
       $ kubectl get pods
       NAME                              READY   STATUS    RESTARTS   AGE
       sqlflow-jhub-78f96dcf88-sbvt6     1/1     Running   0          4m13s
       sqlflow-mysql-55db79fd98-nhfjp    1/1     Running   0          4h8m
       sqlflow-server-7444b4466d-frbcn   1/1     Running   0          4h
       sqlflow-server-7444b4466d-h5w9c   1/1     Running   0          4h
       sqlflow-server-7444b4466d-kndwx   1/1     Running   0          4h
      
  4. Check the SQLFlow Service, so the Notebook server can connect them across their ClusterIP and Port:
    • A MySQL Service named sqlflow-mysql, and
    • An SQLFlow server Service named sqlflow-server
    $ kubectl get svc
    NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
    kubernetes       ClusterIP   10.96.0.1        <none>        443/TCP     29d
    sqlflow-mysql    ClusterIP   10.102.193.217   <none>        3306/TCP    6h4m
    sqlflow-server   ClusterIP   10.102.65.39     <none>        50051/TCP   5h56m
    

Login the JupyterHub

JupyterHub using the PAMAuthenticator as the default authenticate method. the PAM can authenticate the system users with their username and password. You can find more information on authenticators-users-basics, and other authenticator methods from here

Next, please do as the following steps to create a user on the system and login on the Jupyterhub:

  1. List the Pods and execute into the sqlflow-jhub Pod
     $ kubectl get po
     NAME                              READY   STATUS    RESTARTS   AGE
     sqlflow-jhub-78f96dcf88-gp8dg     1/1     Running   0          26m
     sqlflow-mysql-55db79fd98-nhfjp    1/1     Running   0          51m
     ...
     $ kubectl exec -it sqlflow-jhub-78f96dcf88-gp8dg bash 
    
  2. Create a user and set a password by the adduser command:
     $ adduser sqlflow -q --gecos "" --home /home/sqlflow
     Enter new UNIX password:
     Retype new UNIX password:
     passwd: password updated successfully
    
  3. Open your browser and go to <node-ip>:8000 and log in by the username/password as the above step. If you passed the authenticator, the JupyterHub would launch a Notebook server for your account, and then you can run your SQLFlow query in it.