How to use the contextualbandits.online._BasePolicyWithExploit function in contextualbandits

To help you get started, we’ve selected a few contextualbandits examples, based on popular ways it is used in public projects.

Secure your code as it's written. Use Snyk Code to scan source code in minutes - no build needed - and fix issues immediately.

github david-cortes / contextualbandits / contextualbandits / online.py View on Github external
if not self.is_fitted:
            return self._predict_random_if_unfit(X, output_score)

        if exploit:
            scores = self.exploit(X)
        else:
            scores = self.decision_function(X)
        pred = self._name_arms(np.argmax(scores, axis = 1))

        if not output_score:
            return pred
        else:
            score_max = np.max(scores, axis=1).reshape((-1, 1))
            return {"choice" : pred, "score" : score_max}

class BootstrappedUCB(_BasePolicyWithExploit):
    """
    Bootstrapped Upper Confidence Bound

    Obtains an upper confidence bound by taking the percentile of the predictions from a
    set of classifiers, all fit with different bootstrapped samples (multiple samples per arm).
    
    Note
    ----
    When fitting the algorithm to data in batches (online), it's not possible to take an
    exact bootstrapped sample, as the sample is not known in advance. In theory, as the sample size
    grows to infinity, the number of times that an observation appears in a bootstrapped sample is
    distributed ~ Poisson(1). However, I've found that assigning random weights to observations
    produces a more stable effect, so it also has the option to assign weights randomly ~ Gamma(1,1).
    
    Parameters
    ----------
github david-cortes / contextualbandits / contextualbandits / online.py View on Github external
References
    ----------
    .. [1] Cortes, David. "Adapting multi-armed bandits policies to contextual bandits scenarios."
           arXiv preprint arXiv:1811.04383 (2018).
    """
    def __init__(self, base_algorithm, nchoices, nsamples=10, percentile=80,
                 beta_prior='auto', smoothing=None, batch_train=False,
                 assume_unique_reward=False, batch_sample_method='gamma',
                 njobs_arms=1, njobs_samples=-1):
        assert (percentile >= 0) and (percentile <= 100)
        self._add_common_params(base_algorithm, beta_prior, smoothing, njobs_arms, nchoices,
                                batch_train, assume_unique_reward, assign_algo = False, prior_def_ucb = True)
        self.percentile = percentile
        self._add_bootstrapped_inputs(base_algorithm, batch_sample_method, nsamples, njobs_samples, self.percentile)

class BootstrappedTS(_BasePolicyWithExploit):
    """
    Bootstrapped Thompson Sampling
    
    Performs Thompson Sampling by fitting several models per class on bootstrapped samples,
    then makes predictions by taking one of them at random for each class.
    
    Note
    ----
    When fitting the algorithm to data in batches (online), it's not possible to take an
    exact bootstrapped sample, as the sample is not known in advance. In theory, as the sample size
    grows to infinity, the number of times that an observation appears in a bootstrapped sample is
    distributed ~ Poisson(1). However, I've found that assigning random weights to observations
    produces a more stable effect, so it also has the option to assign weights randomly ~ Gamma(1,1).
    
    Parameters
    ----------
github david-cortes / contextualbandits / contextualbandits / online.py View on Github external
njobs : int or None
        Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
        set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls,
        and if these have multi-threading enabled, it might result in a slow-down
        as both functions compete for available threads.
    
    References
    ----------
    .. [1] Agrawal, Shipra, and Navin Goyal. "Thompson sampling for contextual bandits with linear payoffs."
           International Conference on Machine Learning. 2013.
    """
    def __init__(self, nchoices, v_sq=1.0, njobs=1):
        self._ts = True
        self._add_common_lin(v_sq, nchoices, njobs)

class BayesianUCB(_BasePolicyWithExploit):
    """
    Bayesian Upper Confidence Bound
    
    Gets an upper confidence bound by Bayesian Logistic Regression estimates.
    
    Note
    ----
    The implementation here uses PyMC3's GLM formula with default parameters and ADVI.
    This is a very, very slow implementation, and will probably take at least two
    orders or magnitude more to fit compared to other methods.
    
    Parameters
    ----------
    nchoices : int or list-like
        Number of arms/labels to choose from. Can also pass a list, array or series with arm names, in which case
        the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
github david-cortes / contextualbandits / contextualbandits / online.py View on Github external
## NOTE: this is a really slow and poorly thought implementation
        ## TODO: rewrite using some faster framework such as Edward,
        ##       or with a hard-coded coordinate ascent procedure instead. 
        self._add_common_params(_ZeroPredictor(), beta_prior, smoothing, njobs, nchoices,
                                False, assume_unique_reward, assign_algo = False, prior_def_ucb = True)
        assert (percentile >= 0) and (percentile <= 100)
        self.percentile = percentile
        self.n_iter, self.n_samples = _check_bay_inp(method, n_iter, n_samples)
        self.method = method
        self.base_algorithm = _BayesianLogisticRegression(
                    method = self.method, niter = self.n_iter,
                    nsamples = self.n_samples, mode = 'ucb', perc = self.percentile)
        self.batch_train = False


class BayesianTS(_BasePolicyWithExploit):
    """
    Bayesian Thompson Sampling
    
    Performs Thompson Sampling by sampling a set of Logistic Regression coefficients
    from each class, then predicting the class with highest estimate.

    Note
    ----
    The implementation here uses PyMC3's GLM formula with default parameters and ADVI.
    This is a very, very slow implementation, and will probably take at least two
    orders or magnitude more to fit compared to other methods.
    
    Parameters
    ----------
    nchoices : int or list-like
        Number of arms/labels to choose from. Can also pass a list, array or series with arm names, in which case