Class NumericColumnSummary<T>

  • Type Parameters:
    T - the numeric type e.g. Integer, Double
    All Implemented Interfaces:
    Serializable

    @Deprecated
    @PublicEvolving
    public class NumericColumnSummary<T>
    extends ColumnSummary
    implements Serializable
    Deprecated.
    All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API.
    Generic Column Summary for Numeric Types.

    Some values are considered "missing" where "missing" is defined as null, NaN, or Infinity. These values are ignored in some calculations like mean, variance, and standardDeviation.

    Uses the Kahan summation algorithm to avoid numeric instability when computing variance. The algorithm is described in: "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et al, International Conference on Data Engineering 2012.

    See Also:
    FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API, Serialized Form
    • Constructor Detail

      • NumericColumnSummary

        public NumericColumnSummary​(long nonMissingCount,
                                    long nullCount,
                                    long nanCount,
                                    long infinityCount,
                                    T min,
                                    T max,
                                    T sum,
                                    Double mean,
                                    Double variance,
                                    Double standardDeviation)
        Deprecated.
    • Method Detail

      • getMissingCount

        public long getMissingCount()
        Deprecated.
        The number of "missing" values where "missing" is defined as null, NaN, or Infinity.

        These values are ignored in some calculations like mean, variance, and standardDeviation.

      • getNonMissingCount

        public long getNonMissingCount()
        Deprecated.
        The number of values that are not null, NaN, or Infinity.
      • getNonNullCount

        public long getNonNullCount()
        Deprecated.
        The number of non-null values in this column.
        Specified by:
        getNonNullCount in class ColumnSummary
      • getNullCount

        public long getNullCount()
        Deprecated.
        Description copied from class: ColumnSummary
        The number of null values in this column.
        Specified by:
        getNullCount in class ColumnSummary
      • getNanCount

        public long getNanCount()
        Deprecated.
        Number of values that are NaN.

        (always zero for types like Short, Integer, Long)

      • getInfinityCount

        public long getInfinityCount()
        Deprecated.
        Number of values that are positive or negative infinity.

        (always zero for types like Short, Integer, Long)

      • getMin

        public T getMin()
        Deprecated.
      • getMax

        public T getMax()
        Deprecated.
      • getSum

        public T getSum()
        Deprecated.
      • getMean

        public Double getMean()
        Deprecated.
        Null, NaN, and Infinite values are ignored in this calculation.
        See Also:
        Arithmetic Mean
      • getVariance

        public Double getVariance()
        Deprecated.
        Variance is a measure of how far a set of numbers are spread out.

        Null, NaN, and Infinite values are ignored in this calculation.

        See Also:
        Variance
      • getStandardDeviation

        public Double getStandardDeviation()
        Deprecated.
        Standard Deviation is a measure of variation in a set of numbers. It is the square root of the variance.

        Null, NaN, and Infinite values are ignored in this calculation.

        See Also:
        Standard Deviation