Class Optimizer


  • public class Optimizer
    extends Object
    The optimizer that takes the user specified program plan and creates an optimized plan that contains exact descriptions about how the physical execution will take place. It first translates the user program into an internal optimizer representation and then chooses between different alternatives for shipping strategies and local strategies.

    The basic principle is taken from optimizer works in systems such as Volcano/Cascades and Selinger/System-R/DB2. The optimizer walks from the sinks down, generating interesting properties, and ascends from the sources generating alternative plans, pruning against the interesting properties.

    The optimizer also assigns the memory to the individual tasks. This is currently done in a very simple fashion: All sub-tasks that need memory (e.g. reduce or join) are given an equal share of memory.

    • Field Detail

      • HINT_SHIP_STRATEGY

        public static final String HINT_SHIP_STRATEGY
        Compiler hint key for the input channel's shipping strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which shipping strategy to use for the input channel. If the operator has two input channels, the shipping strategy is applied to both input channels.
        See Also:
        Constant Field Values
      • HINT_SHIP_STRATEGY_FIRST_INPUT

        public static final String HINT_SHIP_STRATEGY_FIRST_INPUT
        Compiler hint key for the first input channel's shipping strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which shipping strategy to use for the first input channel. Only applicable to operators with two inputs.
        See Also:
        Constant Field Values
      • HINT_SHIP_STRATEGY_SECOND_INPUT

        public static final String HINT_SHIP_STRATEGY_SECOND_INPUT
        Compiler hint key for the second input channel's shipping strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which shipping strategy to use for the second input channel. Only applicable to operators with two inputs.
        See Also:
        Constant Field Values
      • HINT_LOCAL_STRATEGY

        public static final String HINT_LOCAL_STRATEGY
        Compiler hint key for the operator's local strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which local strategy to use to process the data inside one partition.

        This hint is ignored by operators that do not have a local strategy (such as Map), or by operators that have no choice in their local strategy (such as Cross).

        See Also:
        Constant Field Values
      • HINT_LOCAL_STRATEGY_SORT

        public static final String HINT_LOCAL_STRATEGY_SORT
        Value for the local strategy compiler hint that enforces a sort based local strategy. For example, a Reduce operator will sort the data to group it.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_COMBINING_SORT

        public static final String HINT_LOCAL_STRATEGY_COMBINING_SORT
        Value for the local strategy compiler hint that enforces a sort based local strategy. During sorting a combine method is repeatedly applied to reduce the data volume. For example, a Reduce operator will sort the data to group it.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_SORT_BOTH_MERGE

        public static final String HINT_LOCAL_STRATEGY_SORT_BOTH_MERGE
        Value for the local strategy compiler hint that enforces a sort merge based local strategy on both inputs with subsequent merging of inputs. For example, a Match or CoGroup operator will use a sort-merge strategy to find pairs of matching keys.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_SORT_FIRST_MERGE

        public static final String HINT_LOCAL_STRATEGY_SORT_FIRST_MERGE
        Value for the local strategy compiler hint that enforces a sort merge based local strategy. The first input is sorted, the second input is assumed to be sorted. After sorting both inputs are merged. For example, a Match or CoGroup operator will use a sort-merge strategy to find pairs of matching keys.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_SORT_SECOND_MERGE

        public static final String HINT_LOCAL_STRATEGY_SORT_SECOND_MERGE
        Value for the local strategy compiler hint that enforces a sort merge based local strategy. The second input is sorted, the first input is assumed to be sorted. After sorting both inputs are merged. For example, a Match or CoGroup operator will use a sort-merge strategy to find pairs of matching keys.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_MERGE

        public static final String HINT_LOCAL_STRATEGY_MERGE
        Value for the local strategy compiler hint that enforces a merge based local strategy. Both inputs are assumed to be sorted and are merged. For example, a Match or CoGroup operator will use a merge strategy to find pairs of matching keys.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_HASH_BUILD_FIRST

        public static final String HINT_LOCAL_STRATEGY_HASH_BUILD_FIRST
        Value for the local strategy compiler hint that enforces a hash based local strategy. For example, a Match operator will use a hybrid-hash-join strategy to find pairs of matching keys. The first input will be used to build the hash table, the second input will be used to probe the table.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_HASH_BUILD_SECOND

        public static final String HINT_LOCAL_STRATEGY_HASH_BUILD_SECOND
        Value for the local strategy compiler hint that enforces a hash based local strategy. For example, a Match operator will use a hybrid-hash-join strategy to find pairs of matching keys. The second input will be used to build the hash table, the first input will be used to probe the table.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_FIRST

        public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_FIRST
        Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the first input in the outer-loop of the nested loops. Hence, the data of the first input will be is streamed though, while the data of the second input is stored on disk and repeatedly read.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_SECOND

        public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_SECOND
        Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the second input in the outer-loop of the nested loops. Hence, the data of the second input will be is streamed though, while the data of the first input is stored on disk and repeatedly read.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_FIRST

        public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_FIRST
        Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the first input in the outer-loop of the nested loops. Further more, the first input, being the outer side, will be processed in blocks, and for each block, the second input, being the inner side, will read repeatedly from disk.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_SECOND

        public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_SECOND
        Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the second input in the outer-loop of the nested loops. Further more, the second input, being the outer side, will be processed in blocks, and for each block, the first input, being the inner side, will read repeatedly from disk.
        See Also:
        HINT_LOCAL_STRATEGY, Constant Field Values
      • LOG

        public static final org.slf4j.Logger LOG
        The log handle that is used by the compiler to log messages.
    • Constructor Detail

      • Optimizer

        public Optimizer​(org.apache.flink.configuration.Configuration config)
        Creates a new optimizer instance. The optimizer has no access to statistics about the inputs and can hence not determine any properties. It will perform all optimization with unknown sizes and hence use only the heuristic cost functions, which result in the selection of the most robust execution strategies.
      • Optimizer

        public Optimizer​(DataStatistics stats,
                         org.apache.flink.configuration.Configuration config)
        Creates a new optimizer instance that uses the statistics object to determine properties about the input. Given those statistics, the optimizer can make better choices for the execution strategies.
        Parameters:
        stats - The statistics to be used to determine the input properties.
      • Optimizer

        public Optimizer​(CostEstimator estimator,
                         org.apache.flink.configuration.Configuration config)
        Creates a new optimizer instance. The optimizer has no access to statistics about the inputs and can hence not determine any properties. It will perform all optimization with unknown sizes and hence use only the heuristic cost functions, which result in the selection of the most robust execution strategies.

        The optimizer uses the given cost estimator to compute the costs of the individual operations.

        Parameters:
        estimator - The cost estimator to use to cost the individual operations.
      • Optimizer

        public Optimizer​(DataStatistics stats,
                         CostEstimator estimator,
                         org.apache.flink.configuration.Configuration config)
        Creates a new optimizer instance that uses the statistics object to determine properties about the input. Given those statistics, the optimizer can make better choices for the execution strategies.

        The optimizer uses the given cost estimator to compute the costs of the individual operations.

        Parameters:
        stats - The statistics to be used to determine the input properties.
        estimator - The CostEstimator to use to cost the individual operations.
    • Method Detail

      • getDefaultParallelism

        public int getDefaultParallelism()
      • setDefaultParallelism

        public void setDefaultParallelism​(int defaultParallelism)
      • compile

        public OptimizedPlan compile​(org.apache.flink.api.common.Plan program)
                              throws CompilerException
        Translates the given program to an OptimizedPlan, where all nodes have their local strategy assigned and all channels have a shipping strategy assigned.

        For more details on the optimization phase, see the comments for compile(org.apache.flink.api.common.Plan, org.apache.flink.optimizer.postpass.OptimizerPostPass).

        Parameters:
        program - The program to be translated.
        Returns:
        The optimized plan.
        Throws:
        CompilerException - Thrown, if the plan is invalid or the optimizer encountered an inconsistent situation during the compilation process.
      • createPreOptimizedPlan

        public static List<DataSinkNode> createPreOptimizedPlan​(org.apache.flink.api.common.Plan program)
        This function performs only the first step to the compilation process - the creation of the optimizer representation of the plan. No estimations or enumerations of alternatives are done here.
        Parameters:
        program - The plan to generate the optimizer representation for.
        Returns:
        The optimizer representation of the plan, as a collection of all data sinks from the plan can be traversed.