Class EmptyFieldsCountAccumulator


  • public class EmptyFieldsCountAccumulator
    extends Object
    This program filters lines from a CSV file with empty fields. In doing so, it counts the number of empty fields per column within a CSV file using a custom accumulator for vectors. In this context, empty fields are those, that at most contain whitespace characters like space and tab.

    The input file is a plain text CSV file with the semicolon as field separator and double quotes as field delimiters and three columns. See getDataSet(ExecutionEnvironment, ParameterTool) for configuration.

    Usage: EmptyFieldsCountAccumulator --input <path> --output <path>

    This example shows how to use:

    • custom accumulators
    • tuple data types
    • inline-defined functions
    • naming large tuple types

    Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.

    • Constructor Detail

      • EmptyFieldsCountAccumulator

        public EmptyFieldsCountAccumulator()