Launching an experiment

An optimization experiment is defined by the combination of:

A component (see section registering a component) and its parametric space (the possible values taken by the optimizer)
An application: defined as a file that can be launched through Slurm
An optimization heuristic: an optimization algorithm, a number of exploration and exploitation step, a possible pruning strategy, a possible noise reduction strategy. More details on black-box optimization are available here.

Launching an experiment through the Web interface

The preferred way of launching an optimization experiment is through the Web interface, by filling out the menu available at the path /create.

Experiment arguments

The component and its parametric space: the component to optimize, among the registered ones, as well as the parametric space, defined through a min, max and a step value.
The application: the application to optimize is written as a shell file, directly through the Web interface.
Number of iterations: the number of exploration iteration must be specified as an integer value.

Setting up the optimizer

Pruning strategies: this option stops jobs that take longer than a certain threshold, either longer than the default parametrization or the current median execution time.
Noise reduction strategies: this option resamples parametrization a certain number of times to smooth out possible noise. The resampling can be static (i.e. fixed number of resamples) or dynamic (i.e. the parametrization is repeated until the 95% confidence interval is below a fixed threshold). The fitness aggregation function is the transformation applied to target metric corresponding to the same parametrization (it can be either the mean or median).
Initialization steps: this field specifies the number of parametrization sampled using a Latin Hypercube Design before starting the optimization heuristic.
Optimization heuristic: three possible heuristics (surrogate models, genetic algorithms and simulated annealing) can be chosen, as well as their different hyperparameters. For a detailed overview of the available heuristics and their hyperparametrization, refer to this section here.

Launching an experiment through the command line interface

The shaman-optimize command

Tip

Launching an experiment through the command line interface gives access to more options to tune the optimizer, but it is also riskier because there is no configuration check before running the experiment. Beware !

Warning

Before running the shaman-optimize command, you must source the environment file written during the installation step.

Launching an optimization experiment can also be done through the command line interface shaman-optimize.

shaman-optimize --help

Usage: shaman-optimize [OPTIONS]

  Run an optimization experiment.

Options:
  --component-name TEXT      The name of the component to tune  [required]
  --nbr-iteration INTEGER    The maximal number of iterations to run the
                             experiment for.  [required]

  --sbatch-file TEXT         The path to the sbatch file  [required]
  --experiment-name TEXT     The name to give the experiment  [required]
  --configuration-file TEXT  The path to the configuration file  [required]
  --sbatch-dir TEXT          The directory to store the sbatch
  --slurm-dir TEXT           The directory to write the slurm outputs
  --result-file TEXT         The path to the result file.
  --help                     Show this message and exit.

It requires:

The application to optimize: a file that can be submitted with the sbatch command.
The number of iterations: the allocated number of iterations for the experiment.
Experiment name: the name of the experiment for re-identification in the Web interface
Configuration file: the configuration file of the experiment, see next section.
Directory to store the sbatch: an optional argument to indicate where the sbatch generated by shaman must be stored.
Directory to store the slurm outputs: an optional argument to indicate where to store the slurm outputs.

Writing a configuration file

SHAMan's configuration file comes with five sections and is written using a yaml format:

experiment: contains information about the experiment. Possible options:
default_first: set to True if the first parameter tested by the optimizer
bbo: parametrizes the optimizer. The possible options are the different arguments taken by the optimizer. All the arguments described in the file will be passed as kwargs of the BBOptimizer class (see section stand-alone optimization of the documentation).
noise_reduction: parametrizes the noise reduction features of the optimizer. The possible arguments are the different noise reduction strategies available in bbo, as well as their specific kwargs. For more details on how to use noise reduction with bbo, see section.
pruning: parametrizes the pruning strategy features of the optimizer (see section for more details). Possible options:
max_step_duration: the maximum elapsed time before stopping the parametrization. Can either be:
- A numpy estimator (for example, numpy.median, numpy.mean, etc). Runs that go above the value of the estimator computed on the already tested execution times are interrupted.
- A float value (for example, 5), which corresponds to an elapsed time in seconds
- The default string, which corresponds to stopping runs that take longer than the default value. The option default_first in the experiment section must be set to True.
components: the selected component and either its defined parametric grid, using the min, max and step format, or a list of the possible taken values. In the case of a parametric grid, if the value is set to multiplicative the step option is used as multiplicatively.

An example of a configuration file is written below:

experiment:
  default_first: True

bbo:
  heuristic: surrogate_model
  regression_model: sklearn.gaussian_process.GaussianProcessRegressor
  next_parameter_strategy: bbo.heuristics.surrogate_models.next_parameter_strategies.expected_improvement
  initial_sample_size: 10
  max_retry: 5
  reevaluate: false
  stop_criterion: improvement_criterion
  stop_window: 5
  improvement_estimator: numpy.min
  improvement_threshold: 0.01

components:
  component_1:
    param_1:
      min: 1
      max: 20
      step: 1
    param_2:
      min: 2
      max: 16
      step: 2
      step_type: multiplicative
    param_3:
      - value_1
      - value_3

This example configuration file launches an experiment which:

Runs the default parametrization first
Uses Bayesian Optimization with expected improvement as acquisition function
Uses component_1 and param_1 can take any value from 1 to 20, param_2 can take any value from 2 to 16 with a power of 2, and the param_3 value either 'value_1' or 'value_2'.

Vizualizing an experiment

Once the experiment has been launched through either of the two methods described above, it can be vizualized in the Web interface, where different statistics are displayed. If the experiment is still running, the evolution is available in real-time.