sample module

Sampling with or without weights, with or without replacement.

elimination_sample(k, p, replace=True, prng=None)[source]

Weighted random sample of size k from 1, …, n drawn with or without replacement. The algorithm is inefficient but transparent. Walker’s alias method is more efficient.

Parameters
kint

Desired sample size

p1-D array-like, optional

The probabilities associated with each value in 1, … n.

replaceboolean, optional

Whether the sample is with or without replacement. Default True.

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled
exponential_sample(k, p, prng=None)[source]

Weighted random sample of size of size k from 1, …, n without replacement.

Let X_1, …, X_N be independent exponential random variables with rates w_1, …, w_N, and let W = w_1 + … + w_N.

Then the chance that X_k is the smallest of them is w_k/W.

Because of the “memoryless” property of exponential random variables and the independence, if the smallest is removed, for j!=k, the chance that X_j is the smallest of the remaining variables is w_j/(W-w_k), and so on.

The percentile function of the exponential distribution with rate w is -ln(1-F)/w.

Hence, if U~U[0,1], -ln(U)/w ~ exp(w).

Parameters
kint

Desired sample size

p1-D array-like, optional

The probabilities associated with each value in 1, … n.

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled
fykd_sample(n, k, prng=None)[source]

Use fykd to sample k out of 1, …, n without replacement

Parameters
nint

Population size

kint

Desired sample size

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled
get_prng(seed=None)[source]

Turn seed into a PRNG instance

Parameters
seed{None, int, object}

If seed is None, return a randomly seeded instance of SHA256. If seed is an int, return a new SHA256 instance seeded with seed. If seed is already a PRNG instance, return it. Otherwise raise ValueError.

Returns
object
pikk(n, k, prng=None)[source]

PIKK Algorithm: permute indices and keep k to draw a sample from 1, …, n without replacement. Contrary to what Python does, this assumes indexing starts at 1.

Parameters
nint

Population size

kint

Desired sample size

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled
random_allocation(a, sizes, replace=False, fast=False, p=None, method='sample_by_index', prng=None)[source]

Random samples of sizes sizes from a population a drawn with or without replacement.

Parameters
a1-D array-like or int

If an array or list, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

sizes1-D array-like

sizes of samples to return

replaceboolean, optional

Whether the sampling is with or without replacement. Default False.

fastboolean, optional

Whether to speed up sampling by not sampling with random order. Default False.

p1-D array-like, optional

The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

methodstring

Which sampling function? Default sample_by_index

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
sampleslist of lists

The generated random samples

random_permutation(a, method='Fisher-Yates', prng=None)[source]

Construct a random permutation (re-ordering) of a population a.

The algorithms available are:
  • Fisher-Yates: a shuffling algorithm

  • random_sort: generate random floats and sort

  • permute_by_index: sample integer indices without replacement

Parameters
a1-D array-like or int

If an array or list, a random permutation is generated from its elements. If an int, the random permutation is generated as if a were np.arange(a)

methodstring

Which sampling function?

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
samplessingle item or ndarray

The generated random samples

random_sample(a, size, replace=False, fast=False, p=None, method='sample_by_index', prng=None)[source]

Random sample of size size from a population a drawn with or without weights, with or without replacement.

If no weights are provided, the sample is drawn with equal probability of selecting every item. If weights are provided, len(weights) must equal N.

Sampling methods available are:
  • Fisher-Yates: sampling without weights, without replacement

  • PIKK: sampling without weights, without replacement

  • recursive: samping without weights, without replacement

  • Waterman_R: sampling without weights, without replacement

  • Vitter_Z: sampling without weights, without replacement

  • sample_by_index: sampling without weights, without replacement

  • Exponential: sampling with weights, without replacement

  • Elimination: sampling with weights, without replacement

Fisher-Yates, PIKK, sample_by_index, Exponential, and Elimination return ordered samples, i.e. they are equally likely to return [1, 2] as they are to return [2, 1]. Waterman_R, Vitter_Z, and recursive aren’t guaranteed to randomize the order of items in the sample.

Parameters
a1-D array-like or int

If an array or list, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

replaceboolean, optional

Whether the sample is with or without replacement. Default False.

fastboolean, optional

Whether to speed up sampling by not sampling with random order. Default False.

p1-D array-like, optional

The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.

methodstring

Which sampling function?

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
samplessingle item or ndarray

The generated random samples

recursive_sample(n, k, prng=None)[source]

Recursive sampling algorithm from Cormen et al Draw a sample of to sample k out of 1, …, n without replacement

Note that if k is larger than the default recursion limit of 1000, this function will throw an error. You can change the recursion depth using sys.setrecursionlimit().

Parameters
nint

Population size

kint

Desired sample size

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled
sample_by_index(n, k, replace=False, fast=False, prng=None)[source]

Select indices uniformly at random to draw a sample of to sample k out of 1, …, n without replacement

Parameters
nint

Population size

kint

Desired sample size

replaceboolean, optional

Whether the sample is with or without replacement. Default False.

fastboolean, optional

Whether to speed up sampling by not sampling with random order. Default False.

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled, in random order if not fast
vitter_z(n, k, prng=None)[source]

Vitter’s Algorithm Z for reservoir SRSs (Vitter 1985). Draw a sample of to sample k out of 1, …, n without replacement

Parameters
nint

Population size

kint

Desired sample size

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled
waterman_r(n, k, prng=None)[source]

Waterman’s Algorithm R for reservoir SRSs Draw a sample of to sample k out of 1, …, n without replacement

Parameters
nint

Population size

kint

Desired sample size

prng{None, int, object}

If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.

Returns
list of items sampled