Class EmptyFieldsCountAccumulator
- java.lang.Object
-
- org.apache.flink.examples.java.relational.EmptyFieldsCountAccumulator
-
public class EmptyFieldsCountAccumulator extends Object
This program filters lines from a CSV file with empty fields. In doing so, it counts the number of empty fields per column within a CSV file using a custom accumulator for vectors. In this context, empty fields are those, that at most contain whitespace characters like space and tab.The input file is a plain text CSV file with the semicolon as field separator and double quotes as field delimiters and three columns. See
getDataSet(ExecutionEnvironment, ParameterTool)for configuration.Usage:
EmptyFieldsCountAccumulator --input <path> --output <path>
This example shows how to use:
- custom accumulators
- tuple data types
- inline-defined functions
- naming large tuple types
Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classEmptyFieldsCountAccumulator.EmptyFieldFilterThis function filters all incoming tuples that have one or more empty fields.static classEmptyFieldsCountAccumulator.StringTripleIt is recommended to use POJOs (Plain old Java objects) instead of TupleX for data types with many fields.static classEmptyFieldsCountAccumulator.VectorAccumulatorThis accumulator maintains a vector of counts.
-
Constructor Summary
Constructors Constructor Description EmptyFieldsCountAccumulator()
-