Class ReservoirSamplerWithReplacement<T>
- java.lang.Object
-
- org.apache.flink.api.java.sampling.RandomSampler<T>
-
- org.apache.flink.api.java.sampling.DistributedRandomSampler<T>
-
- org.apache.flink.api.java.sampling.ReservoirSamplerWithReplacement<T>
-
- Type Parameters:
T- The type of sample.
@Internal public class ReservoirSamplerWithReplacement<T> extends DistributedRandomSampler<T>
A simple in memory implementation of Reservoir Sampling with replacement and with only one pass through the input iteration whose size is unpredictable. The basic idea behind this sampler implementation is quite similar toReservoirSamplerWithoutReplacement. The main difference is that, in the first phase, we generate weights for each element K times, so that each element can get selected multiple times.This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".
-
-
Field Summary
-
Fields inherited from class org.apache.flink.api.java.sampling.DistributedRandomSampler
emptyIntermediateIterable, numSamples
-
Fields inherited from class org.apache.flink.api.java.sampling.RandomSampler
emptyIterable, EPSILON
-
-
Constructor Summary
Constructors Constructor Description ReservoirSamplerWithReplacement(int numSamples)Create a sampler with fixed sample size and default random number generator.ReservoirSamplerWithReplacement(int numSamples, long seed)Create a sampler with fixed sample size and random number generator seed.ReservoirSamplerWithReplacement(int numSamples, Random random)Create a sampler with fixed sample size and random number generator.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Iterator<IntermediateSampleData<T>>sampleInPartition(Iterator<T> input)Sample algorithm for the first phase.-
Methods inherited from class org.apache.flink.api.java.sampling.DistributedRandomSampler
sample, sampleInCoordinator
-
-
-
-
Constructor Detail
-
ReservoirSamplerWithReplacement
public ReservoirSamplerWithReplacement(int numSamples)
Create a sampler with fixed sample size and default random number generator.- Parameters:
numSamples- Number of selected elements, must be non-negative.
-
ReservoirSamplerWithReplacement
public ReservoirSamplerWithReplacement(int numSamples, long seed)Create a sampler with fixed sample size and random number generator seed.- Parameters:
numSamples- Number of selected elements, must be non-negative.seed- Random number generator seed
-
ReservoirSamplerWithReplacement
public ReservoirSamplerWithReplacement(int numSamples, Random random)Create a sampler with fixed sample size and random number generator.- Parameters:
numSamples- Number of selected elements, must be non-negative.random- Random number generator
-
-
Method Detail
-
sampleInPartition
public Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
Description copied from class:DistributedRandomSamplerSample algorithm for the first phase. It operates on a single partition.- Specified by:
sampleInPartitionin classDistributedRandomSampler<T>- Parameters:
input- The DataSet input of each partition.- Returns:
- Intermediate sample output which will be used as the input of the second phase.
-
-