No results for undefined

Blog Posts

No results for undefined
Powered by Algolia

Two-Loop Hyperoptimization

November 23, 20184 min read

I recently started using Scikit-Optimize (or skopt for short) to run Bayesian optimization on the hyperparameters of a bunch of fully-connected neural networks. Overall, it’s a very helpful tool! The hyperparameters I optimized are

  • the number of dense layers nln_\text{l}
  • the optimizer’s learning rate rlr_l
  • the numbers of nodes in each layer Nn={nn,i}iIN_\text{n} = \{n_{\text{n},i}\}_{i \in I}
  • the dropout rates for the Monte Carlo dropout layers preceding every dense layer Rd={rd,i}iIR_\text{d} = \{r_{\text{d},i}\}_{i \in I}
  • the activation functions A={ai}iIA = \{a_i\}_{i \in I} in each layer

where I={1,,nl}I = \{1,\dots,n_\text{l}\}. And right there I had a use case that skopt doesn’t appear to cover - at least not out of the box. The difficulty is that the last three items in the list depend on the value of the first one. That’s an ill-posed optimization problem. We’d be trying to minimize a loss function LθL_\theta parametrized by the vector θ=(nl,rl,Nn,Rd,A)\vec\theta = (n_\text{l}, r_l, N_\text{n}, R_\text{d}, A) over a parameter space S\mathcal{S} which depends itself on the current parameters θ\vec\theta, i.e.

θmin=argminθS(θ)  L(θ).\vec\theta_\text{min} = \underset{\vec\theta \in \mathcal{S}(\vec\theta)}{\arg \min} \; L(\vec\theta).

But of course the search space can’t change while we’re searching it! Luckily, there’s a simple fix. We just split the problem into two separate minimizations by pulling everything that doesn’t depend on the number of layers nln_\text{l} into an outer loop. Hence, two-loop hyperoptimization. This yields

θmin=argminθ1S1  argminθ2S2(θ1)  L(θ1,θ2),\vec\theta_\text{min} = \underset{\vec\theta_1 \in \mathcal{S}_1}{\arg \min} \; \underset{\vec\theta_2 \in \mathcal{S}_2(\vec\theta_1)}{\arg \min} \; L(\vec\theta_1, \vec\theta_2),

where θ=(θ1,θ2)\vec\theta = (\vec\theta_1,\vec\theta_2) with θ1=(nl,rl)\vec\theta_1 = (n_\text{l}, r_l) and θ2,i=(Nn,Rd,A)\vec\theta_{2,i} = (N_\text{n}, R_\text{d}, A). The full search space is then given by S=S1θ1S1S2(θ1)\mathcal{S} = \mathcal{S}_1 \cup \bigcup_{\vec\theta_1 \in \mathcal{S}_1} \mathcal{S}_2(\vec\theta_1).

Implemented in Python it doesn’t look quite as pretty any more. To some degree that is because skopt insists on calling its objective function with a single argument, namely a list of the current set of hyperparameters. That means bringing in any additional arguments as required in this case to access the current parameters of the outer loop inside the inner one requires some workaround. The best I could come up with is some slightly verbose currying. See for yourself:

import keras
import numpy as np
import skopt
from skopt.space import Categorical, Integer, Real

from .cross_val import cross_val

# iteration counter for the current outer and inner minimization loop
iter_counts = [1, 1]

def hyper_opt(model, data, n_calls=(10, 10), methods=["gp"], n_splits=3, verbose=False):
    model: instance of Model class
    data: instance of Data class
    n_calls: 2-tuple of ints, number of iterations for the (outer, inner) minimization loop
    methods: list of strings containing one or more of gp, dummy, forest
        specifies which of skopt's minimizers to try
    n_splits: int how many cross validations to perform, min=1
    outer_space = [  # pun intended
        # number of layers
        Integer(1, 5, name="n_layers"),
        # optimizer's learning rate
        Real(1e-5, 1e-4, "log-uniform", name="learning_rate"),

    def curried_inner_objective(n_layers, learning_rate):
        def inner_objective(inner_hypars):
            # unpack inner/outer iteration counters (iic/oic) and total number of calls (inc/onc)
            oic, iic, onc, inc = iter_counts + list(n_calls)
            model.log(f"Hyper loop: outer {oic}/{onc}, inner {iic}/{inc}")
            iter_counts[1] += 1

            hypars = (n_layers, learning_rate) + inner_hypars
            loss = cross_val(model, data, hypars, n_splits=n_splits)

            if loss < model.min_loss:
                model.log(f"found new min loss {round(loss, 4)}")
                model.log(f"params: {hypars}")
                model.min_loss = loss
                model.best_hypars = hypar
                model.best_model = model.model

            return loss

        return inner_objective

    def curried_outer_objective(method):
        def outer_objective(outer_hypars):

            n_layers, learning_rate = outer_hypars
            # number of nodes in each dense layer
            nodes = (Integer(10, 100, name=f"n_nodes_{i+1}") for i in range(n_layers))
            # fraction of dropped nodes in each dropout layer
            dropouts = (
                Real(0, 0.5, name=f"dropout_rate_{i+1}") for i in range(n_layers)
            # list of activation functions for each dense layer
            activations = (
                Categorical(["tanh", "relu"], name=f"activation_{i+1}")
                for i in range(n_layers)
            res = getattr(skopt, method + "_minimize")(
                (*nodes, *dropouts, *activations),
            global iter_counts
            iter_counts = [iter_counts[0] + 1, 1]
            return res.fun

        return outer_objective

    for method in methods:
        res = getattr(skopt, method + "_minimize")(
        setattr(model, method + "hyper_res", res)
        model.log(f"\n{method} hyper optimization results:\n{res}")
© 2018 - Janosh RiebesellThis site is open source
Powered byGatsbyGithubNetlify