Class DataSet<T>

    • Constructor Detail

      • DataSet

        protected DataSet​(ExecutionEnvironment context,
                          org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInfo)
        Deprecated.
    • Method Detail

      • fillInType

        protected void fillInType​(org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInfo)
        Deprecated.
        Tries to fill in the type information. Type information can be filled in later when the program uses a type hint. This method checks whether the type information has ever been accessed before and does not allow modifications if the type was accessed already. This ensures consistency by making sure different parts of the operation do not assume different type information.
        Parameters:
        typeInfo - The type information to fill in.
        Throws:
        IllegalStateException - Thrown, if the type information has been accessed before.
      • getType

        public org.apache.flink.api.common.typeinfo.TypeInformation<T> getType()
        Deprecated.
        Returns the TypeInformation for the type of this DataSet.
        Returns:
        The TypeInformation for the type of this DataSet.
        See Also:
        TypeInformation
      • clean

        public <F> F clean​(F f)
        Deprecated.
      • map

        public <R> MapOperator<T,​R> map​(org.apache.flink.api.common.functions.MapFunction<T,​R> mapper)
        Deprecated.
        Applies a Map transformation on this DataSet.

        The transformation calls a MapFunction for each element of the DataSet. Each MapFunction call returns exactly one element.

        Parameters:
        mapper - The MapFunction that is called for each element of the DataSet.
        Returns:
        A MapOperator that represents the transformed DataSet.
        See Also:
        MapFunction, RichMapFunction, MapOperator
      • mapPartition

        public <R> MapPartitionOperator<T,​R> mapPartition​(org.apache.flink.api.common.functions.MapPartitionFunction<T,​R> mapPartition)
        Deprecated.
        Applies a Map-style operation to the entire partition of the data. The function is called once per parallel partition of the data, and the entire partition is available through the given Iterator. The number of elements that each instance of the MapPartition function sees is non deterministic and depends on the parallelism of the operation.

        This function is intended for operations that cannot transform individual elements, requires no grouping of elements. To transform individual elements, the use of map() and flatMap() is preferable.

        Parameters:
        mapPartition - The MapPartitionFunction that is called for the full DataSet.
        Returns:
        A MapPartitionOperator that represents the transformed DataSet.
        See Also:
        MapPartitionFunction, MapPartitionOperator
      • flatMap

        public <R> FlatMapOperator<T,​R> flatMap​(org.apache.flink.api.common.functions.FlatMapFunction<T,​R> flatMapper)
        Deprecated.
        Applies a FlatMap transformation on a DataSet.

        The transformation calls a RichFlatMapFunction for each element of the DataSet. Each FlatMapFunction call can return any number of elements including none.

        Parameters:
        flatMapper - The FlatMapFunction that is called for each element of the DataSet.
        Returns:
        A FlatMapOperator that represents the transformed DataSet.
        See Also:
        RichFlatMapFunction, FlatMapOperator, DataSet
      • filter

        public FilterOperator<T> filter​(org.apache.flink.api.common.functions.FilterFunction<T> filter)
        Deprecated.
        Applies a Filter transformation on a DataSet.

        The transformation calls a RichFilterFunction for each element of the DataSet and retains only those element for which the function returns true. Elements for which the function returns false are filtered.

        Parameters:
        filter - The FilterFunction that is called for each element of the DataSet.
        Returns:
        A FilterOperator that represents the filtered DataSet.
        See Also:
        RichFilterFunction, FilterOperator, DataSet
      • project

        public <OUT extends org.apache.flink.api.java.tuple.Tuple> ProjectOperator<?,​OUT> project​(int... fieldIndexes)
        Deprecated.
        Applies a Project transformation on a Tuple DataSet.

        Note: Only Tuple DataSets can be projected using field indexes.

        The transformation projects each Tuple of the DataSet onto a (sub)set of fields.

        Additional fields can be added to the projection by calling project(int[]).

        Note: With the current implementation, the Project transformation loses type information.

        Parameters:
        fieldIndexes - The field indexes of the input tuple that are retained. The order of fields in the output tuple corresponds to the order of field indexes.
        Returns:
        A ProjectOperator that represents the projected DataSet.
        See Also:
        Tuple, DataSet, ProjectOperator
      • aggregate

        public AggregateOperator<T> aggregate​(Aggregations agg,
                                              int field)
        Deprecated.
        Applies an Aggregate transformation on a non-grouped Tuple DataSet.

        Note: Only Tuple DataSets can be aggregated. The transformation applies a built-in Aggregation on a specified field of a Tuple DataSet. Additional aggregation functions can be added to the resulting AggregateOperator by calling AggregateOperator.and(Aggregations, int).

        Parameters:
        agg - The built-in aggregation function that is computed.
        field - The index of the Tuple field on which the aggregation function is applied.
        Returns:
        An AggregateOperator that represents the aggregated DataSet.
        See Also:
        Tuple, Aggregations, AggregateOperator, DataSet
      • sum

        public AggregateOperator<T> sum​(int field)
        Deprecated.
        Syntactic sugar for aggregate (SUM, field).
        Parameters:
        field - The index of the Tuple field on which the aggregation function is applied.
        Returns:
        An AggregateOperator that represents the summed DataSet.
        See Also:
        AggregateOperator
      • count

        public long count()
                   throws Exception
        Deprecated.
        Convenience method to get the count (number of elements) of a DataSet.
        Returns:
        A long integer that represents the number of elements in the data set.
        Throws:
        Exception
      • collect

        public List<T> collect()
                        throws Exception
        Deprecated.
        Convenience method to get the elements of a DataSet as a List. As DataSet can contain a lot of data, this method should be used with caution.
        Returns:
        A List containing the elements of the DataSet
        Throws:
        Exception
      • reduce

        public ReduceOperator<T> reduce​(org.apache.flink.api.common.functions.ReduceFunction<T> reducer)
        Deprecated.
        Applies a Reduce transformation on a non-grouped DataSet.

        The transformation consecutively calls a RichReduceFunction until only a single element remains which is the result of the transformation. A ReduceFunction combines two elements into one new element of the same type.

        Parameters:
        reducer - The ReduceFunction that is applied on the DataSet.
        Returns:
        A ReduceOperator that represents the reduced DataSet.
        See Also:
        RichReduceFunction, ReduceOperator, DataSet
      • reduceGroup

        public <R> GroupReduceOperator<T,​R> reduceGroup​(org.apache.flink.api.common.functions.GroupReduceFunction<T,​R> reducer)
        Deprecated.
        Applies a GroupReduce transformation on a non-grouped DataSet.

        The transformation calls a RichGroupReduceFunction once with the full DataSet. The GroupReduceFunction can iterate over all elements of the DataSet and emit any number of output elements including none.

        Parameters:
        reducer - The GroupReduceFunction that is applied on the DataSet.
        Returns:
        A GroupReduceOperator that represents the reduced DataSet.
        See Also:
        RichGroupReduceFunction, GroupReduceOperator, DataSet
      • combineGroup

        public <R> GroupCombineOperator<T,​R> combineGroup​(org.apache.flink.api.common.functions.GroupCombineFunction<T,​R> combiner)
        Deprecated.
        Applies a GroupCombineFunction on a non-grouped DataSet. A CombineFunction is similar to a GroupReduceFunction but does not perform a full data exchange. Instead, the CombineFunction calls the combine method once per partition for combining a group of results. This operator is suitable for combining values into an intermediate format before doing a proper groupReduce where the data is shuffled across the node for further reduction. The GroupReduce operator can also be supplied with a combiner by implementing the RichGroupReduce function. The combine method of the RichGroupReduce function demands input and output type to be the same. The CombineFunction, on the other side, can have an arbitrary output type.
        Parameters:
        combiner - The GroupCombineFunction that is applied on the DataSet.
        Returns:
        A GroupCombineOperator which represents the combined DataSet.
      • minBy

        public ReduceOperator<T> minBy​(int... fields)
        Deprecated.
        Selects an element with minimum value.

        The minimum is computed over the specified fields in lexicographical order.

        Example 1: Given a data set with elements [0, 1], [1, 0], the results will be:

        • minBy(0): [0, 1]
        • minBy(1): [1, 0]

        Example 2: Given a data set with elements [0, 0], [0, 1], the results will be:

        • minBy(0, 1): [0, 0]

        If multiple values with minimum value at the specified fields exist, a random one will be picked.

        Internally, this operation is implemented as a ReduceFunction.

        Parameters:
        fields - Field positions to compute the minimum over
        Returns:
        A ReduceOperator representing the minimum
      • maxBy

        public ReduceOperator<T> maxBy​(int... fields)
        Deprecated.
        Selects an element with maximum value.

        The maximum is computed over the specified fields in lexicographical order.

        Example 1: Given a data set with elements [0, 1], [1, 0], the results will be:

        • maxBy(0): [1, 0]
        • maxBy(1): [0, 1]

        Example 2: Given a data set with elements [0, 0], [0, 1], the results will be:

        • maxBy(0, 1): [0, 1]

        If multiple values with maximum value at the specified fields exist, a random one will be picked.

        Internally, this operation is implemented as a ReduceFunction.

        Parameters:
        fields - Field positions to compute the maximum over
        Returns:
        A ReduceOperator representing the maximum
      • first

        public GroupReduceOperator<T,​T> first​(int n)
        Deprecated.
        Returns a new set containing the first n elements in this DataSet.
        Parameters:
        n - The desired number of elements.
        Returns:
        A ReduceGroupOperator that represents the DataSet containing the elements.
      • distinct

        public <K> DistinctOperator<T> distinct​(org.apache.flink.api.java.functions.KeySelector<T,​K> keyExtractor)
        Deprecated.
        Returns a distinct set of a DataSet using a KeySelector function.

        The KeySelector function is called for each element of the DataSet and extracts a single key value on which the decision is made if two items are distinct or not.

        Parameters:
        keyExtractor - The KeySelector function which extracts the key values from the DataSet on which the distinction of the DataSet is decided.
        Returns:
        A DistinctOperator that represents the distinct DataSet.
      • distinct

        public DistinctOperator<T> distinct​(int... fields)
        Deprecated.
        Returns a distinct set of a Tuple DataSet using field position keys.

        The field position keys specify the fields of Tuples on which the decision is made if two Tuples are distinct or not.

        Note: Field position keys can only be specified for Tuple DataSets.

        Parameters:
        fields - One or more field positions on which the distinction of the DataSet is decided.
        Returns:
        A DistinctOperator that represents the distinct DataSet.
      • distinct

        public DistinctOperator<T> distinct​(String... fields)
        Deprecated.
        Returns a distinct set of a DataSet using expression keys.

        The field expression keys specify the fields of a CompositeType (e.g., Tuple or Pojo type) on which the decision is made if two elements are distinct or not. In case of a AtomicType, only the wildcard expression ("*") is valid.

        Parameters:
        fields - One or more field expressions on which the distinction of the DataSet is decided.
        Returns:
        A DistinctOperator that represents the distinct DataSet.
      • distinct

        public DistinctOperator<T> distinct()
        Deprecated.
        Returns a distinct set of a DataSet.

        If the input is a CompositeType (Tuple or Pojo type), distinct is performed on all fields and each field must be a key type

        Returns:
        A DistinctOperator that represents the distinct DataSet.
      • join

        public <R> JoinOperator.JoinOperatorSets<T,​R> join​(DataSet<R> other)
        Deprecated.
        Initiates a Join transformation.

        A Join transformation joins the elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        This method returns a JoinOperator.JoinOperatorSets on which one of the where methods can be called to define the join key of the first joining (i.e., this) DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        Returns:
        A JoinOperatorSets to continue the definition of the Join transformation.
        See Also:
        JoinOperator.JoinOperatorSets, DataSet
      • join

        public <R> JoinOperator.JoinOperatorSets<T,​R> join​(DataSet<R> other,
                                                                 org.apache.flink.api.common.operators.base.JoinOperatorBase.JoinHint strategy)
        Deprecated.
        Initiates a Join transformation.

        A Join transformation joins the elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        This method returns a JoinOperator.JoinOperatorSets on which one of the where methods can be called to define the join key of the first joining (i.e., this) DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        strategy - The strategy that should be used execute the join. If null is given, then the optimizer will pick the join strategy.
        Returns:
        A JoinOperatorSets to continue the definition of the Join transformation.
        See Also:
        JoinOperator.JoinOperatorSets, DataSet
      • joinWithTiny

        public <R> JoinOperator.JoinOperatorSets<T,​R> joinWithTiny​(DataSet<R> other)
        Deprecated.
        Initiates a Join transformation.

        A Join transformation joins the elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        This method also gives the hint to the optimizer that the second DataSet to join is much smaller than the first one.

        This method returns a JoinOperator.JoinOperatorSets on which JoinOperator.JoinOperatorSets.where(String...) needs to be called to define the join key of the first joining (i.e., this) DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        Returns:
        A JoinOperatorSets to continue the definition of the Join transformation.
        See Also:
        JoinOperator.JoinOperatorSets, DataSet
      • joinWithHuge

        public <R> JoinOperator.JoinOperatorSets<T,​R> joinWithHuge​(DataSet<R> other)
        Deprecated.
        Initiates a Join transformation.

        A Join transformation joins the elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        This method also gives the hint to the optimizer that the second DataSet to join is much larger than the first one.

        This method returns a JoinOperator.JoinOperatorSets on which one of the where methods can be called to define the join key of the first joining (i.e., this) DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        Returns:
        A JoinOperatorSet to continue the definition of the Join transformation.
        See Also:
        JoinOperator.JoinOperatorSets, DataSet
      • leftOuterJoin

        public <R> JoinOperatorSetsBase<T,​R> leftOuterJoin​(DataSet<R> other)
        Deprecated.
        Initiates a Left Outer Join transformation.

        An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        Elements of the left DataSet (i.e. this) that do not have a matching element on the other side are joined with null and emitted to the resulting DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        Returns:
        A JoinOperatorSet to continue the definition of the Join transformation.
        See Also:
        JoinOperatorSetsBase, DataSet
      • leftOuterJoin

        public <R> JoinOperatorSetsBase<T,​R> leftOuterJoin​(DataSet<R> other,
                                                                 org.apache.flink.api.common.operators.base.JoinOperatorBase.JoinHint strategy)
        Deprecated.
        Initiates a Left Outer Join transformation.

        An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        Elements of the left DataSet (i.e. this) that do not have a matching element on the other side are joined with null and emitted to the resulting DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        strategy - The strategy that should be used execute the join. If null is given, then the optimizer will pick the join strategy.
        Returns:
        A JoinOperatorSet to continue the definition of the Join transformation.
        See Also:
        JoinOperatorSetsBase, DataSet
      • rightOuterJoin

        public <R> JoinOperatorSetsBase<T,​R> rightOuterJoin​(DataSet<R> other)
        Deprecated.
        Initiates a Right Outer Join transformation.

        An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        Elements of the right DataSet (i.e. other) that do not have a matching element on this side are joined with null and emitted to the resulting DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        Returns:
        A JoinOperatorSet to continue the definition of the Join transformation.
        See Also:
        JoinOperatorSetsBase, DataSet
      • rightOuterJoin

        public <R> JoinOperatorSetsBase<T,​R> rightOuterJoin​(DataSet<R> other,
                                                                  org.apache.flink.api.common.operators.base.JoinOperatorBase.JoinHint strategy)
        Deprecated.
        Initiates a Right Outer Join transformation.

        An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        Elements of the right DataSet (i.e. other) that do not have a matching element on this side are joined with null and emitted to the resulting DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        strategy - The strategy that should be used execute the join. If null is given, then the optimizer will pick the join strategy.
        Returns:
        A JoinOperatorSet to continue the definition of the Join transformation.
        See Also:
        JoinOperatorSetsBase, DataSet
      • fullOuterJoin

        public <R> JoinOperatorSetsBase<T,​R> fullOuterJoin​(DataSet<R> other)
        Deprecated.
        Initiates a Full Outer Join transformation.

        An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        Elements of both DataSets that do not have a matching element on the opposing side are joined with null and emitted to the resulting DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        Returns:
        A JoinOperatorSet to continue the definition of the Join transformation.
        See Also:
        JoinOperatorSetsBase, DataSet
      • fullOuterJoin

        public <R> JoinOperatorSetsBase<T,​R> fullOuterJoin​(DataSet<R> other,
                                                                 org.apache.flink.api.common.operators.base.JoinOperatorBase.JoinHint strategy)
        Deprecated.
        Initiates a Full Outer Join transformation.

        An Outer Join transformation joins two elements of two DataSets on key equality and provides multiple ways to combine joining elements into one DataSet.

        Elements of both DataSets that do not have a matching element on the opposing side are joined with null and emitted to the resulting DataSet.

        Parameters:
        other - The other DataSet with which this DataSet is joined.
        strategy - The strategy that should be used execute the join. If null is given, then the optimizer will pick the join strategy.
        Returns:
        A JoinOperatorSet to continue the definition of the Join transformation.
        See Also:
        JoinOperatorSetsBase, DataSet
      • coGroup

        public <R> CoGroupOperator.CoGroupOperatorSets<T,​R> coGroup​(DataSet<R> other)
        Deprecated.
        Initiates a CoGroup transformation.

        A CoGroup transformation combines the elements of two DataSets into one DataSet. It groups each DataSet individually on a key and gives groups of both DataSets with equal keys together into a RichCoGroupFunction. If a DataSet has a group with no matching key in the other DataSet, the CoGroupFunction is called with an empty group for the non-existing group.

        The CoGroupFunction can iterate over the elements of both groups and return any number of elements including none.

        This method returns a CoGroupOperator.CoGroupOperatorSets on which one of the where methods can be called to define the join key of the first joining (i.e., this) DataSet.

        Parameters:
        other - The other DataSet of the CoGroup transformation.
        Returns:
        A CoGroupOperatorSets to continue the definition of the CoGroup transformation.
        See Also:
        CoGroupOperator.CoGroupOperatorSets, CoGroupOperator, DataSet
      • cross

        public <R> CrossOperator.DefaultCross<T,​R> cross​(DataSet<R> other)
        Deprecated.
        Initiates a Cross transformation.

        A Cross transformation combines the elements of two DataSets into one DataSet. It builds all pair combinations of elements of both DataSets, i.e., it builds a Cartesian product.

        The resulting CrossOperator.DefaultCross wraps each pair of crossed elements into a Tuple2, with the element of the first input being the first field of the tuple and the element of the second input being the second field of the tuple.

        Call CrossOperator.DefaultCross.with(org.apache.flink.api.common.functions.CrossFunction) to define a CrossFunction which is called for each pair of crossed elements. The CrossFunction returns a exactly one element for each pair of input elements.

        Parameters:
        other - The other DataSet with which this DataSet is crossed.
        Returns:
        A DefaultCross that returns a Tuple2 for each pair of crossed elements.
        See Also:
        CrossOperator.DefaultCross, CrossFunction, DataSet, Tuple2
      • crossWithTiny

        public <R> CrossOperator.DefaultCross<T,​R> crossWithTiny​(DataSet<R> other)
        Deprecated.
        Initiates a Cross transformation.

        A Cross transformation combines the elements of two DataSets into one DataSet. It builds all pair combinations of elements of both DataSets, i.e., it builds a Cartesian product. This method also gives the hint to the optimizer that the second DataSet to cross is much smaller than the first one.

        The resulting CrossOperator.DefaultCross wraps each pair of crossed elements into a Tuple2, with the element of the first input being the first field of the tuple and the element of the second input being the second field of the tuple.

        Call CrossOperator.DefaultCross.with(org.apache.flink.api.common.functions.CrossFunction) to define a CrossFunction which is called for each pair of crossed elements. The CrossFunction returns a exactly one element for each pair of input elements.

        Parameters:
        other - The other DataSet with which this DataSet is crossed.
        Returns:
        A DefaultCross that returns a Tuple2 for each pair of crossed elements.
        See Also:
        CrossOperator.DefaultCross, CrossFunction, DataSet, Tuple2
      • crossWithHuge

        public <R> CrossOperator.DefaultCross<T,​R> crossWithHuge​(DataSet<R> other)
        Deprecated.
        Initiates a Cross transformation.

        A Cross transformation combines the elements of two DataSets into one DataSet. It builds all pair combinations of elements of both DataSets, i.e., it builds a Cartesian product. This method also gives the hint to the optimizer that the second DataSet to cross is much larger than the first one.

        The resulting CrossOperator.DefaultCross wraps each pair of crossed elements into a Tuple2, with the element of the first input being the first field of the tuple and the element of the second input being the second field of the tuple.

        Call CrossOperator.DefaultCross.with(org.apache.flink.api.common.functions.CrossFunction) to define a CrossFunction which is called for each pair of crossed elements. The CrossFunction returns a exactly one element for each pair of input elements.

        Parameters:
        other - The other DataSet with which this DataSet is crossed.
        Returns:
        A DefaultCross that returns a Tuple2 for each pair of crossed elements.
        See Also:
        CrossOperator.DefaultCross, CrossFunction, DataSet, Tuple2
      • iterate

        public IterativeDataSet<T> iterate​(int maxIterations)
        Deprecated.
        Initiates an iterative part of the program that executes multiple times and feeds back data sets. The iterative part needs to be closed by calling IterativeDataSet.closeWith(DataSet). The data set given to the closeWith(DataSet) method is the data set that will be fed back and used as the input to the next iteration. The return value of the closeWith(DataSet) method is the resulting data set after the iteration has terminated.

        An example of an iterative computation is as follows:

        
         DataSet<Double> input = ...;
        
         DataSet<Double> startOfIteration = input.iterate(10);
         DataSet<Double> toBeFedBack = startOfIteration
                                       .map(new MyMapper())
                                       .groupBy(...).reduceGroup(new MyReducer());
         DataSet<Double> result = startOfIteration.closeWith(toBeFedBack);
         

        The iteration has a maximum number of times that it executes. A dynamic termination can be realized by using a termination criterion (see IterativeDataSet.closeWith(DataSet, DataSet)).

        Parameters:
        maxIterations - The maximum number of times that the iteration is executed.
        Returns:
        An IterativeDataSet that marks the start of the iterative part and needs to be closed by IterativeDataSet.closeWith(DataSet).
        See Also:
        IterativeDataSet
      • iterateDelta

        public <R> DeltaIteration<T,​R> iterateDelta​(DataSet<R> workset,
                                                          int maxIterations,
                                                          int... keyPositions)
        Deprecated.
        Initiates a delta iteration. A delta iteration is similar to a regular iteration (as started by iterate(int), but maintains state across the individual iteration steps. The Solution set, which represents the current state at the beginning of each iteration can be obtained via DeltaIteration.getSolutionSet(). It can be be accessed by joining (or CoGrouping) with it. The DataSet that represents the workset of an iteration can be obtained via DeltaIteration.getWorkset(). The solution set is updated by producing a delta for it, which is merged into the solution set at the end of each iteration step.

        The delta iteration must be closed by calling DeltaIteration.closeWith(DataSet, DataSet). The two parameters are the delta for the solution set and the new workset (the data set that will be fed back). The return value of the closeWith(DataSet, DataSet) method is the resulting data set after the iteration has terminated. Delta iterations terminate when the feed back data set (the workset) is empty. In addition, a maximum number of steps is given as a fall back termination guard.

        Elements in the solution set are uniquely identified by a key. When merging the solution set delta, contained elements with the same key are replaced.

        NOTE: Delta iterations currently support only tuple valued data types. This restriction will be removed in the future. The key is specified by the tuple position.

        A code example for a delta iteration is as follows

        
         DeltaIteration<Tuple2<Long, Long>, Tuple2<Long, Long>> iteration =
                                                          initialState.iterateDelta(initialFeedbackSet, 100, 0);
        
         DataSet<Tuple2<Long, Long>> delta = iteration.groupBy(0).aggregate(Aggregations.AVG, 1)
                                                      .join(iteration.getSolutionSet()).where(0).equalTo(0)
                                                      .flatMap(new ProjectAndFilter());
        
         DataSet<Tuple2<Long, Long>> feedBack = delta.join(someOtherSet).where(...).equalTo(...).with(...);
        
         // close the delta iteration (delta and new workset are identical)
         DataSet<Tuple2<Long, Long>> result = iteration.closeWith(delta, feedBack);
         
        Parameters:
        workset - The initial version of the data set that is fed back to the next iteration step (the workset).
        maxIterations - The maximum number of iteration steps, as a fall back safeguard.
        keyPositions - The position of the tuple fields that is used as the key of the solution set.
        Returns:
        The DeltaIteration that marks the start of a delta iteration.
        See Also:
        DeltaIteration
      • runOperation

        public <X> DataSet<X> runOperation​(CustomUnaryOperation<T,​X> operation)
        Deprecated.
        Runs a CustomUnaryOperation on the data set. Custom operations are typically complex operators that are composed of multiple steps.
        Parameters:
        operation - The operation to run.
        Returns:
        The data set produced by the operation.
      • union

        public UnionOperator<T> union​(DataSet<T> other)
        Deprecated.
        Creates a union of this DataSet with an other DataSet. The other DataSet must be of the same data type.
        Parameters:
        other - The other DataSet which is unioned with the current DataSet.
        Returns:
        The resulting DataSet.
      • partitionByHash

        public PartitionOperator<T> partitionByHash​(int... fields)
        Deprecated.
        Hash-partitions a DataSet on the specified key fields.

        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        Parameters:
        fields - The field indexes on which the DataSet is hash-partitioned.
        Returns:
        The partitioned DataSet.
      • partitionByHash

        public PartitionOperator<T> partitionByHash​(String... fields)
        Deprecated.
        Hash-partitions a DataSet on the specified key fields.

        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        Parameters:
        fields - The field expressions on which the DataSet is hash-partitioned.
        Returns:
        The partitioned DataSet.
      • partitionByHash

        public <K extends Comparable<K>> PartitionOperator<T> partitionByHash​(org.apache.flink.api.java.functions.KeySelector<T,​K> keyExtractor)
        Deprecated.
        Partitions a DataSet using the specified KeySelector.

        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        Parameters:
        keyExtractor - The KeyExtractor with which the DataSet is hash-partitioned.
        Returns:
        The partitioned DataSet.
        See Also:
        KeySelector
      • partitionByRange

        public PartitionOperator<T> partitionByRange​(int... fields)
        Deprecated.
        Range-partitions a DataSet on the specified key fields.

        Important:This operation requires an extra pass over the DataSet to compute the range boundaries and shuffles the whole DataSet over the network. This can take significant amount of time.

        Parameters:
        fields - The field indexes on which the DataSet is range-partitioned.
        Returns:
        The partitioned DataSet.
      • partitionByRange

        public PartitionOperator<T> partitionByRange​(String... fields)
        Deprecated.
        Range-partitions a DataSet on the specified key fields.

        Important:This operation requires an extra pass over the DataSet to compute the range boundaries and shuffles the whole DataSet over the network. This can take significant amount of time.

        Parameters:
        fields - The field expressions on which the DataSet is range-partitioned.
        Returns:
        The partitioned DataSet.
      • partitionByRange

        public <K extends Comparable<K>> PartitionOperator<T> partitionByRange​(org.apache.flink.api.java.functions.KeySelector<T,​K> keyExtractor)
        Deprecated.
        Range-partitions a DataSet using the specified KeySelector.

        Important:This operation requires an extra pass over the DataSet to compute the range boundaries and shuffles the whole DataSet over the network. This can take significant amount of time.

        Parameters:
        keyExtractor - The KeyExtractor with which the DataSet is range-partitioned.
        Returns:
        The partitioned DataSet.
        See Also:
        KeySelector
      • partitionCustom

        public <K> PartitionOperator<T> partitionCustom​(org.apache.flink.api.common.functions.Partitioner<K> partitioner,
                                                        int field)
        Deprecated.
        Partitions a tuple DataSet on the specified key fields using a custom partitioner. This method takes the key position to partition on, and a partitioner that accepts the key type.

        Note: This method works only on single field keys.

        Parameters:
        partitioner - The partitioner to assign partitions to keys.
        field - The field index on which the DataSet is to partitioned.
        Returns:
        The partitioned DataSet.
      • partitionCustom

        public <K> PartitionOperator<T> partitionCustom​(org.apache.flink.api.common.functions.Partitioner<K> partitioner,
                                                        String field)
        Deprecated.
        Partitions a POJO DataSet on the specified key fields using a custom partitioner. This method takes the key expression to partition on, and a partitioner that accepts the key type.

        Note: This method works only on single field keys.

        Parameters:
        partitioner - The partitioner to assign partitions to keys.
        field - The field index on which the DataSet is to partitioned.
        Returns:
        The partitioned DataSet.
      • partitionCustom

        public <K extends Comparable<K>> PartitionOperator<T> partitionCustom​(org.apache.flink.api.common.functions.Partitioner<K> partitioner,
                                                                              org.apache.flink.api.java.functions.KeySelector<T,​K> keyExtractor)
        Deprecated.
        Partitions a DataSet on the key returned by the selector, using a custom partitioner. This method takes the key selector to get the key to partition on, and a partitioner that accepts the key type.

        Note: This method works only on single field keys, i.e. the selector cannot return tuples of fields.

        Parameters:
        partitioner - The partitioner to assign partitions to keys.
        keyExtractor - The KeyExtractor with which the DataSet is partitioned.
        Returns:
        The partitioned DataSet.
        See Also:
        KeySelector
      • rebalance

        public PartitionOperator<T> rebalance()
        Deprecated.
        Enforces a re-balancing of the DataSet, i.e., the DataSet is evenly distributed over all parallel instances of the following task. This can help to improve performance in case of heavy data skew and compute intensive operations.

        Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

        Returns:
        The re-balanced DataSet.
      • sortPartition

        public SortPartitionOperator<T> sortPartition​(int field,
                                                      org.apache.flink.api.common.operators.Order order)
        Deprecated.
        Locally sorts the partitions of the DataSet on the specified field in the specified order. DataSet can be sorted on multiple fields by chaining sortPartition() calls.
        Parameters:
        field - The field index on which the DataSet is sorted.
        order - The order in which the DataSet is sorted.
        Returns:
        The DataSet with sorted local partitions.
      • sortPartition

        public SortPartitionOperator<T> sortPartition​(String field,
                                                      org.apache.flink.api.common.operators.Order order)
        Deprecated.
        Locally sorts the partitions of the DataSet on the specified field in the specified order. DataSet can be sorted on multiple fields by chaining sortPartition() calls.
        Parameters:
        field - The field expression referring to the field on which the DataSet is sorted.
        order - The order in which the DataSet is sorted.
        Returns:
        The DataSet with sorted local partitions.
      • sortPartition

        public <K> SortPartitionOperator<T> sortPartition​(org.apache.flink.api.java.functions.KeySelector<T,​K> keyExtractor,
                                                          org.apache.flink.api.common.operators.Order order)
        Deprecated.
        Locally sorts the partitions of the DataSet on the extracted key in the specified order. The DataSet can be sorted on multiple values by returning a tuple from the KeySelector.

        Note that no additional sort keys can be appended to a KeySelector sort keys. To sort the partitions by multiple values using KeySelector, the KeySelector must return a tuple consisting of the values.

        Parameters:
        keyExtractor - The KeySelector function which extracts the key values from the DataSet on which the DataSet is sorted.
        order - The order in which the DataSet is sorted.
        Returns:
        The DataSet with sorted local partitions.
      • writeAsText

        public DataSink<T> writeAsText​(String filePath)
        Deprecated.
        Writes a DataSet as text file(s) to the specified location.

        For each element of the DataSet the result of Object.toString() is written.

        Output files and directories
        What output how writeAsText() method produces is depending on other circumstance

        • A directory is created and multiple files are written underneath. (Default behavior)
          This sink creates a directory called "path1", and files "1", "2" ... are written underneath depending on parallelism
          .
           └── path1/
               ├── 1
               ├── 2
               └── ...
          Code Example
          dataset.writeAsText("file:///path1");
        • A single file called "path1" is created when parallelism is set to 1
          .
           └── path1 
          Code Example
          // Parallelism is set to only this particular operation
           dataset.writeAsText("file:///path1").setParallelism(1);
          
           // This will creates the same effect but note all operators' parallelism are set to one
           env.setParallelism(1);
           ...
           dataset.writeAsText("file:///path1"); 
        • A directory is always created when fs.output.always-create-directory is set to true in config.yaml file, even when parallelism is set to 1.
          .
           └── path1/
               └── 1 
          Code Example
          // fs.output.always-create-directory = true
           dataset.writeAsText("file:///path1").setParallelism(1); 
        Parameters:
        filePath - The path pointing to the location the text file or files under the directory is written to.
        Returns:
        The DataSink that writes the DataSet.
        See Also:
        TextOutputFormat
      • writeAsText

        public DataSink<T> writeAsText​(String filePath,
                                       org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
        Deprecated.
        Writes a DataSet as text file(s) to the specified location.

        For each element of the DataSet the result of Object.toString() is written.

        Parameters:
        filePath - The path pointing to the location the text file is written to.
        writeMode - Control the behavior for existing files. Options are NO_OVERWRITE and OVERWRITE.
        Returns:
        The DataSink that writes the DataSet.
        See Also:
        TextOutputFormat, Output files and directories
      • writeAsFormattedText

        public DataSink<String> writeAsFormattedText​(String filePath,
                                                     org.apache.flink.core.fs.FileSystem.WriteMode writeMode,
                                                     TextOutputFormat.TextFormatter<T> formatter)
        Deprecated.
        Writes a DataSet as text file(s) to the specified location.

        For each element of the DataSet the result of TextOutputFormat.TextFormatter.format(Object) is written.

        Parameters:
        filePath - The path pointing to the location the text file is written to.
        writeMode - Control the behavior for existing files. Options are NO_OVERWRITE and OVERWRITE.
        formatter - formatter that is applied on every element of the DataSet.
        Returns:
        The DataSink that writes the DataSet.
        See Also:
        TextOutputFormat, Output files and directories
      • writeAsCsv

        public DataSink<T> writeAsCsv​(String filePath)
        Deprecated.
        Writes a Tuple DataSet as CSV file(s) to the specified location.

        Note: Only a Tuple DataSet can written as a CSV file.

        For each Tuple field the result of Object.toString() is written. Tuple fields are separated by the default field delimiter "comma" (,).

        Tuples are are separated by the newline character (\n).

        Parameters:
        filePath - The path pointing to the location the CSV file is written to.
        Returns:
        The DataSink that writes the DataSet.
        See Also:
        Tuple, CsvOutputFormat, Output files and directories
      • writeAsCsv

        public DataSink<T> writeAsCsv​(String filePath,
                                      org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
        Deprecated.
        Writes a Tuple DataSet as CSV file(s) to the specified location.

        Note: Only a Tuple DataSet can written as a CSV file.

        For each Tuple field the result of Object.toString() is written. Tuple fields are separated by the default field delimiter "comma" (,).

        Tuples are are separated by the newline character (\n).

        Parameters:
        filePath - The path pointing to the location the CSV file is written to.
        writeMode - The behavior regarding existing files. Options are NO_OVERWRITE and OVERWRITE.
        Returns:
        The DataSink that writes the DataSet.
        See Also:
        Tuple, CsvOutputFormat, Output files and directories
      • writeAsCsv

        public DataSink<T> writeAsCsv​(String filePath,
                                      String rowDelimiter,
                                      String fieldDelimiter)
        Deprecated.
        Writes a Tuple DataSet as CSV file(s) to the specified location with the specified field and line delimiters.

        Note: Only a Tuple DataSet can written as a CSV file.

        For each Tuple field the result of Object.toString() is written.

        Parameters:
        filePath - The path pointing to the location the CSV file is written to.
        rowDelimiter - The row delimiter to separate Tuples.
        fieldDelimiter - The field delimiter to separate Tuple fields.
        See Also:
        Tuple, CsvOutputFormat, Output files and directories
      • writeAsCsv

        public DataSink<T> writeAsCsv​(String filePath,
                                      String rowDelimiter,
                                      String fieldDelimiter,
                                      org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
        Deprecated.
        Writes a Tuple DataSet as CSV file(s) to the specified location with the specified field and line delimiters.

        Note: Only a Tuple DataSet can written as a CSV file. For each Tuple field the result of Object.toString() is written.

        Parameters:
        filePath - The path pointing to the location the CSV file is written to.
        rowDelimiter - The row delimiter to separate Tuples.
        fieldDelimiter - The field delimiter to separate Tuple fields.
        writeMode - The behavior regarding existing files. Options are NO_OVERWRITE and OVERWRITE.
        See Also:
        Tuple, CsvOutputFormat, Output files and directories
      • print

        public void print()
                   throws Exception
        Deprecated.
        Prints the elements in a DataSet to the standard output stream System.out of the JVM that calls the print() method. For programs that are executed in a cluster, this method needs to gather the contents of the DataSet back to the client, to print it there.

        The string written for each element is defined by the Object.toString() method.

        This method immediately triggers the program execution, similar to the collect() and count() methods.

        Throws:
        Exception
        See Also:
        printToErr(), printOnTaskManager(String)
      • printToErr

        public void printToErr()
                        throws Exception
        Deprecated.
        Prints the elements in a DataSet to the standard error stream System.err of the JVM that calls the print() method. For programs that are executed in a cluster, this method needs to gather the contents of the DataSet back to the client, to print it there.

        The string written for each element is defined by the Object.toString() method.

        This method immediately triggers the program execution, similar to the collect() and count() methods.

        Throws:
        Exception
        See Also:
        print(), printOnTaskManager(String)
      • printOnTaskManager

        public DataSink<T> printOnTaskManager​(String prefix)
        Deprecated.
        Writes a DataSet to the standard output streams (stdout) of the TaskManagers that execute the program (or more specifically, the data sink operators). On a typical cluster setup, the data will appear in the TaskManagers' .out files.

        To print the data to the console or stdout stream of the client process instead, use the print() method.

        For each element of the DataSet the result of Object.toString() is written.

        Parameters:
        prefix - The string to prefix each line of the output with. This helps identifying outputs from different printing sinks.
        Returns:
        The DataSink operator that writes the DataSet.
        See Also:
        print()
      • print

        @Deprecated
        @PublicEvolving
        public DataSink<T> print​(String sinkIdentifier)
        Deprecated.
        Writes a DataSet to the standard output stream (stdout).

        For each element of the DataSet the result of Object.toString() is written.

        Parameters:
        sinkIdentifier - The string to prefix the output with.
        Returns:
        The DataSink that writes the DataSet.
      • write

        public DataSink<T> write​(org.apache.flink.api.common.io.FileOutputFormat<T> outputFormat,
                                 String filePath)
        Deprecated.
        Writes a DataSet using a FileOutputFormat to a specified location. This method adds a data sink to the program.
        Parameters:
        outputFormat - The FileOutputFormat to write the DataSet.
        filePath - The path to the location where the DataSet is written.
        Returns:
        The DataSink that writes the DataSet.
        See Also:
        FileOutputFormat
      • write

        public DataSink<T> write​(org.apache.flink.api.common.io.FileOutputFormat<T> outputFormat,
                                 String filePath,
                                 org.apache.flink.core.fs.FileSystem.WriteMode writeMode)
        Deprecated.
        Writes a DataSet using a FileOutputFormat to a specified location. This method adds a data sink to the program.
        Parameters:
        outputFormat - The FileOutputFormat to write the DataSet.
        filePath - The path to the location where the DataSet is written.
        writeMode - The mode of writing, indicating whether to overwrite existing files.
        Returns:
        The DataSink that writes the DataSet.
        See Also:
        FileOutputFormat
      • output

        public DataSink<T> output​(org.apache.flink.api.common.io.OutputFormat<T> outputFormat)
        Deprecated.
        Emits a DataSet using an OutputFormat. This method adds a data sink to the program. Programs may have multiple data sinks. A DataSet may also have multiple consumers (data sinks or transformations) at the same time.
        Parameters:
        outputFormat - The OutputFormat to process the DataSet.
        Returns:
        The DataSink that processes the DataSet.
        See Also:
        OutputFormat, DataSink
      • checkSameExecutionContext

        protected static void checkSameExecutionContext​(DataSet<?> set1,
                                                        DataSet<?> set2)
        Deprecated.