Class ReservoirSamplerWithReplacement<T>

  • Type Parameters:
    T - The type of sample.

    @Internal
    public class ReservoirSamplerWithReplacement<T>
    extends DistributedRandomSampler<T>
    A simple in memory implementation of Reservoir Sampling with replacement and with only one pass through the input iteration whose size is unpredictable. The basic idea behind this sampler implementation is quite similar to ReservoirSamplerWithoutReplacement. The main difference is that, in the first phase, we generate weights for each element K times, so that each element can get selected multiple times.

    This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".

    • Constructor Detail

      • ReservoirSamplerWithReplacement

        public ReservoirSamplerWithReplacement​(int numSamples)
        Create a sampler with fixed sample size and default random number generator.
        Parameters:
        numSamples - Number of selected elements, must be non-negative.
      • ReservoirSamplerWithReplacement

        public ReservoirSamplerWithReplacement​(int numSamples,
                                               long seed)
        Create a sampler with fixed sample size and random number generator seed.
        Parameters:
        numSamples - Number of selected elements, must be non-negative.
        seed - Random number generator seed
      • ReservoirSamplerWithReplacement

        public ReservoirSamplerWithReplacement​(int numSamples,
                                               Random random)
        Create a sampler with fixed sample size and random number generator.
        Parameters:
        numSamples - Number of selected elements, must be non-negative.
        random - Random number generator