This document describes the design of model evaluation task for ElasticDL.
Model evaluation: Computing metrics to judge the performance of the trained model.
Evaluation worker: The worker responsible for performing model evaluation task.
Multiprocessing: Executing tasks in multiple threads in parallel on the same pod.
- There’s only one evaluation worker without multiprocessing.
- Master pod is responsible for creating the evaluation worker.
- Evaluation worker is created by master pod together with the workers for training.
- Evaluation starts after a specified warm-up period and on a given time interval. For example, we need to expose
the following parameters to users:
start_delay_secs: Start evaluating after waiting for this many seconds.
throttle_secs: Do not re-evaluate unless the last evaluation was started at least this many seconds ago.
- The evaluation worker fetches the latest model from master pod.
- Model can be evaluated by a specified number of steps or batches of evaluation samples. If
None, evaluation will continue until reaching the end of input.
- Model evaluation metrics can be defined by users together with the model definition.
- The computed model evaluation metrics can be report back to master through RPC call.
MasterServicer.ReportEvaluationMetrics()and additional proto definitions such as
Workerto support the following:
distributed_evaluate()that contains the main logic for model evaluation.
report_task_result()that reports evaluation task result (e.g. task id and error message) back to master through RPC call.
report_evaluation_metrics()that reports the computed evaluation metrics (e.g. accuracy, precision, recall, etc.) back to master through RPC call.
- Add main CLI entry-point to
Worker.distributed_evaluate()that will be used in
WorkerManagerto support the following:
- Instantiate a separate evaluation task queue from evaluation data directory.
- Start an evaluation worker from evaluation task queue.
master.main()to support model evaluation task if user requested.
A list of potential features we may want for model evaluation in the future:
num_parallel_processes: The number of children processes to run evaluation on each individual evaluation worker.
sample_weights: Optional Numpy array of weights for the test samples, used for weighting the loss function.
Some of the ideas are borrowed from existing solutions listed below: