sample module¶
Sampling with or without weights, with or without replacement.
-
elimination_sample
(k, p, replace=True, prng=None)[source]¶ Weighted random sample of size k from 1, …, n drawn with or without replacement. The algorithm is inefficient but transparent. Walker’s alias method is more efficient.
- Parameters
- kint
Desired sample size
- p1-D array-like, optional
The probabilities associated with each value in 1, … n.
- replaceboolean, optional
Whether the sample is with or without replacement. Default True.
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled
-
exponential_sample
(k, p, prng=None)[source]¶ Weighted random sample of size of size k from 1, …, n without replacement.
Let X_1, …, X_N be independent exponential random variables with rates w_1, …, w_N, and let W = w_1 + … + w_N.
Then the chance that X_k is the smallest of them is w_k/W.
Because of the “memoryless” property of exponential random variables and the independence, if the smallest is removed, for j!=k, the chance that X_j is the smallest of the remaining variables is w_j/(W-w_k), and so on.
The percentile function of the exponential distribution with rate w is -ln(1-F)/w.
Hence, if U~U[0,1], -ln(U)/w ~ exp(w).
- Parameters
- kint
Desired sample size
- p1-D array-like, optional
The probabilities associated with each value in 1, … n.
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled
-
fykd_sample
(n, k, prng=None)[source]¶ Use fykd to sample k out of 1, …, n without replacement
- Parameters
- nint
Population size
- kint
Desired sample size
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled
-
get_prng
(seed=None)[source]¶ Turn seed into a PRNG instance
- Parameters
- seed{None, int, object}
If seed is None, return a randomly seeded instance of SHA256. If seed is an int, return a new SHA256 instance seeded with seed. If seed is already a PRNG instance, return it. Otherwise raise ValueError.
- Returns
- object
-
pikk
(n, k, prng=None)[source]¶ PIKK Algorithm: permute indices and keep k to draw a sample from 1, …, n without replacement. Contrary to what Python does, this assumes indexing starts at 1.
- Parameters
- nint
Population size
- kint
Desired sample size
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled
-
random_allocation
(a, sizes, replace=False, fast=False, p=None, method='sample_by_index', prng=None)[source]¶ Random samples of sizes sizes from a population a drawn with or without replacement.
- Parameters
- a1-D array-like or int
If an array or list, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)
- sizes1-D array-like
sizes of samples to return
- replaceboolean, optional
Whether the sampling is with or without replacement. Default False.
- fastboolean, optional
Whether to speed up sampling by not sampling with random order. Default False.
- p1-D array-like, optional
The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.
- methodstring
Which sampling function? Default sample_by_index
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- sampleslist of lists
The generated random samples
-
random_permutation
(a, method='Fisher-Yates', prng=None)[source]¶ Construct a random permutation (re-ordering) of a population a.
- The algorithms available are:
Fisher-Yates: a shuffling algorithm
random_sort: generate random floats and sort
permute_by_index: sample integer indices without replacement
- Parameters
- a1-D array-like or int
If an array or list, a random permutation is generated from its elements. If an int, the random permutation is generated as if a were np.arange(a)
- methodstring
Which sampling function?
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- samplessingle item or ndarray
The generated random samples
-
random_sample
(a, size, replace=False, fast=False, p=None, method='sample_by_index', prng=None)[source]¶ Random sample of size size from a population a drawn with or without weights, with or without replacement.
If no weights are provided, the sample is drawn with equal probability of selecting every item. If weights are provided, len(weights) must equal N.
- Sampling methods available are:
Fisher-Yates: sampling without weights, without replacement
PIKK: sampling without weights, without replacement
recursive: samping without weights, without replacement
Waterman_R: sampling without weights, without replacement
Vitter_Z: sampling without weights, without replacement
sample_by_index: sampling without weights, without replacement
Exponential: sampling with weights, without replacement
Elimination: sampling with weights, without replacement
Fisher-Yates, PIKK, sample_by_index, Exponential, and Elimination return ordered samples, i.e. they are equally likely to return [1, 2] as they are to return [2, 1]. Waterman_R, Vitter_Z, and recursive aren’t guaranteed to randomize the order of items in the sample.
- Parameters
- a1-D array-like or int
If an array or list, a random sample is generated from its elements. If an int, the random sample is generated as if a were np.arange(a)
- sizeint or tuple of ints, optional
Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
- replaceboolean, optional
Whether the sample is with or without replacement. Default False.
- fastboolean, optional
Whether to speed up sampling by not sampling with random order. Default False.
- p1-D array-like, optional
The probabilities associated with each entry in a. If not given the sample assumes a uniform distribution over all entries in a.
- methodstring
Which sampling function?
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- samplessingle item or ndarray
The generated random samples
-
recursive_sample
(n, k, prng=None)[source]¶ Recursive sampling algorithm from Cormen et al Draw a sample of to sample k out of 1, …, n without replacement
Note that if k is larger than the default recursion limit of 1000, this function will throw an error. You can change the recursion depth using sys.setrecursionlimit().
- Parameters
- nint
Population size
- kint
Desired sample size
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled
-
sample_by_index
(n, k, replace=False, fast=False, prng=None)[source]¶ Select indices uniformly at random to draw a sample of to sample k out of 1, …, n without replacement
- Parameters
- nint
Population size
- kint
Desired sample size
- replaceboolean, optional
Whether the sample is with or without replacement. Default False.
- fastboolean, optional
Whether to speed up sampling by not sampling with random order. Default False.
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled, in random order if not fast
-
vitter_z
(n, k, prng=None)[source]¶ Vitter’s Algorithm Z for reservoir SRSs (Vitter 1985). Draw a sample of to sample k out of 1, …, n without replacement
- Parameters
- nint
Population size
- kint
Desired sample size
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled
-
waterman_r
(n, k, prng=None)[source]¶ Waterman’s Algorithm R for reservoir SRSs Draw a sample of to sample k out of 1, …, n without replacement
- Parameters
- nint
Population size
- kint
Desired sample size
- prng{None, int, object}
If prng is None, return a randomly seeded instance of SHA256. If prng is an int, return a new SHA256 instance seeded with seed. If prng is already a PRNG instance, return it.
- Returns
- list of items sampled