Class SplitDataProperties<T>

  • Type Parameters:
    T - The type of the DataSource on which the SplitDataProperties are defined.
    All Implemented Interfaces:
    org.apache.flink.api.common.operators.GenericDataSourceBase.SplitDataProperties<T>

    @Deprecated
    @PublicEvolving
    public class SplitDataProperties<T>
    extends Object
    implements org.apache.flink.api.common.operators.GenericDataSourceBase.SplitDataProperties<T>
    Deprecated.
    All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API.
    SplitDataProperties define data properties on InputSplit generated by the InputFormat of a DataSource.

    InputSplits are units of input which are distributed among and assigned to parallel data source subtasks. SplitDataProperties can define that the elements which are generated by the associated InputFormat are

    • Partitioned on one or more fields across InputSplits, i.e., all elements with the same (combination of) key(s) are located in the same input split.
    • Grouped on one or more fields within an InputSplit, i.e., all elements of an input split that have the same (combination of) key(s) are emitted in a single sequence one after the other.
    • Ordered on one or more fields within an InputSplit, i.e., all elements within an input split are in the defined order.

    IMPORTANT: SplitDataProperties can improve the execution of a program because certain data reorganization steps such as shuffling or sorting can be avoided. HOWEVER, if SplitDataProperties are not correctly defined, the result of the program might be wrong!

    See Also:
    InputSplit, InputFormat, DataSource, FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API
    • Constructor Detail

      • SplitDataProperties

        public SplitDataProperties​(org.apache.flink.api.common.typeinfo.TypeInformation<T> type)
        Deprecated.
        Creates SplitDataProperties for the given data types.
        Parameters:
        type - The data type of the SplitDataProperties.
      • SplitDataProperties

        public SplitDataProperties​(DataSource<T> source)
        Deprecated.
        Creates SplitDataProperties for the given data types.
        Parameters:
        source - The DataSource for which the SplitDataProperties are created.
    • Method Detail

      • splitsPartitionedBy

        public SplitDataProperties<T> splitsPartitionedBy​(int... partitionFields)
        Deprecated.
        Defines that data is partitioned across input splits on the fields defined by field positions. All records sharing the same key (combination) must be contained in a single input split.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        partitionFields - The field positions of the partitioning keys.
        Returns:
        This SplitDataProperties object.
      • splitsPartitionedBy

        public SplitDataProperties<T> splitsPartitionedBy​(String partitionMethodId,
                                                          int... partitionFields)
        Deprecated.
        Defines that data is partitioned using a specific partitioning method across input splits on the fields defined by field positions. All records sharing the same key (combination) must be contained in a single input split.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        partitionMethodId - An ID for the method that was used to partition the data across splits.
        partitionFields - The field positions of the partitioning keys.
        Returns:
        This SplitDataProperties object.
      • splitsPartitionedBy

        public SplitDataProperties<T> splitsPartitionedBy​(String partitionFields)
        Deprecated.
        Defines that data is partitioned across input splits on the fields defined by field expressions. Multiple field expressions must be separated by the semicolon ';' character. All records sharing the same key (combination) must be contained in a single input split.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        partitionFields - The field expressions of the partitioning keys.
        Returns:
        This SplitDataProperties object.
      • splitsPartitionedBy

        public SplitDataProperties<T> splitsPartitionedBy​(String partitionMethodId,
                                                          String partitionFields)
        Deprecated.
        Defines that data is partitioned using an identifiable method across input splits on the fields defined by field expressions. Multiple field expressions must be separated by the semicolon ';' character. All records sharing the same key (combination) must be contained in a single input split.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        partitionMethodId - An ID for the method that was used to partition the data across splits.
        partitionFields - The field expressions of the partitioning keys.
        Returns:
        This SplitDataProperties object.
      • splitsGroupedBy

        public SplitDataProperties<T> splitsGroupedBy​(int... groupFields)
        Deprecated.
        Defines that the data within an input split is grouped on the fields defined by the field positions. All records sharing the same key (combination) must be subsequently emitted by the input format for each input split.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        groupFields - The field positions of the grouping keys.
        Returns:
        This SplitDataProperties object.
      • splitsGroupedBy

        public SplitDataProperties<T> splitsGroupedBy​(String groupFields)
        Deprecated.
        Defines that the data within an input split is grouped on the fields defined by the field expressions. Multiple field expressions must be separated by the semicolon ';' character. All records sharing the same key (combination) must be subsequently emitted by the input format for each input split.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        groupFields - The field expressions of the grouping keys.
        Returns:
        This SplitDataProperties object.
      • splitsOrderedBy

        public SplitDataProperties<T> splitsOrderedBy​(int[] orderFields,
                                                      org.apache.flink.api.common.operators.Order[] orders)
        Deprecated.
        Defines that the data within an input split is sorted on the fields defined by the field positions in the specified orders. All records of an input split must be emitted by the input format in the defined order.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        orderFields - The field positions of the grouping keys.
        orders - The orders of the fields.
        Returns:
        This SplitDataProperties object.
      • splitsOrderedBy

        public SplitDataProperties<T> splitsOrderedBy​(String orderFields,
                                                      org.apache.flink.api.common.operators.Order[] orders)
        Deprecated.
        Defines that the data within an input split is sorted on the fields defined by the field expressions in the specified orders. Multiple field expressions must be separated by the semicolon ';' character. All records of an input split must be emitted by the input format in the defined order.

        IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!

        Parameters:
        orderFields - The field expressions of the grouping key.
        orders - The orders of the fields.
        Returns:
        This SplitDataProperties object.
      • getSplitPartitionKeys

        public int[] getSplitPartitionKeys()
        Deprecated.
        Specified by:
        getSplitPartitionKeys in interface org.apache.flink.api.common.operators.GenericDataSourceBase.SplitDataProperties<T>
      • getSplitPartitioner

        public org.apache.flink.api.common.functions.Partitioner<T> getSplitPartitioner()
        Deprecated.
        Specified by:
        getSplitPartitioner in interface org.apache.flink.api.common.operators.GenericDataSourceBase.SplitDataProperties<T>
      • getSplitGroupKeys

        public int[] getSplitGroupKeys()
        Deprecated.
        Specified by:
        getSplitGroupKeys in interface org.apache.flink.api.common.operators.GenericDataSourceBase.SplitDataProperties<T>
      • getSplitOrder

        public org.apache.flink.api.common.operators.Ordering getSplitOrder()
        Deprecated.
        Specified by:
        getSplitOrder in interface org.apache.flink.api.common.operators.GenericDataSourceBase.SplitDataProperties<T>