Class Optimizer
- java.lang.Object
-
- org.apache.flink.optimizer.Optimizer
-
public class Optimizer extends Object
The optimizer that takes the user specified program plan and creates an optimized plan that contains exact descriptions about how the physical execution will take place. It first translates the user program into an internal optimizer representation and then chooses between different alternatives for shipping strategies and local strategies.The basic principle is taken from optimizer works in systems such as Volcano/Cascades and Selinger/System-R/DB2. The optimizer walks from the sinks down, generating interesting properties, and ascends from the sources generating alternative plans, pruning against the interesting properties.
The optimizer also assigns the memory to the individual tasks. This is currently done in a very simple fashion: All sub-tasks that need memory (e.g. reduce or join) are given an equal share of memory.
-
-
Field Summary
Fields Modifier and Type Field Description static StringHINT_LOCAL_STRATEGYCompiler hint key for the operator's local strategy.static StringHINT_LOCAL_STRATEGY_COMBINING_SORTValue for the local strategy compiler hint that enforces a sort based local strategy.static StringHINT_LOCAL_STRATEGY_HASH_BUILD_FIRSTValue for the local strategy compiler hint that enforces a hash based local strategy.static StringHINT_LOCAL_STRATEGY_HASH_BUILD_SECONDValue for the local strategy compiler hint that enforces a hash based local strategy.static StringHINT_LOCAL_STRATEGY_MERGEValue for the local strategy compiler hint that enforces a merge based local strategy.static StringHINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_FIRSTValue for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy.static StringHINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_SECONDValue for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy.static StringHINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_FIRSTValue for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy.static StringHINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_SECONDValue for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy.static StringHINT_LOCAL_STRATEGY_SORTValue for the local strategy compiler hint that enforces a sort based local strategy.static StringHINT_LOCAL_STRATEGY_SORT_BOTH_MERGEValue for the local strategy compiler hint that enforces a sort merge based local strategy on both inputs with subsequent merging of inputs.static StringHINT_LOCAL_STRATEGY_SORT_FIRST_MERGEValue for the local strategy compiler hint that enforces a sort merge based local strategy.static StringHINT_LOCAL_STRATEGY_SORT_SECOND_MERGEValue for the local strategy compiler hint that enforces a sort merge based local strategy.static StringHINT_SHIP_STRATEGYCompiler hint key for the input channel's shipping strategy.static StringHINT_SHIP_STRATEGY_BROADCASTValue for the shipping strategy compiler hint that enforces a broadcast strategy on the input channel.static StringHINT_SHIP_STRATEGY_FIRST_INPUTCompiler hint key for the first input channel's shipping strategy.static StringHINT_SHIP_STRATEGY_FORWARDValue for the shipping strategy compiler hint that enforces a Forward strategy on the input channel, i.e.static StringHINT_SHIP_STRATEGY_REPARTITIONValue for the shipping strategy compiler hint that enforces a random repartition strategy.static StringHINT_SHIP_STRATEGY_REPARTITION_HASHValue for the shipping strategy compiler hint that enforces a hash-partition strategy.static StringHINT_SHIP_STRATEGY_REPARTITION_RANGEValue for the shipping strategy compiler hint that enforces a range-partition strategy.static StringHINT_SHIP_STRATEGY_SECOND_INPUTCompiler hint key for the second input channel's shipping strategy.static org.slf4j.LoggerLOGThe log handle that is used by the compiler to log messages.
-
Constructor Summary
Constructors Constructor Description Optimizer(org.apache.flink.configuration.Configuration config)Creates a new optimizer instance.Optimizer(CostEstimator estimator, org.apache.flink.configuration.Configuration config)Creates a new optimizer instance.Optimizer(DataStatistics stats, org.apache.flink.configuration.Configuration config)Creates a new optimizer instance that uses the statistics object to determine properties about the input.Optimizer(DataStatistics stats, CostEstimator estimator, org.apache.flink.configuration.Configuration config)Creates a new optimizer instance that uses the statistics object to determine properties about the input.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description OptimizedPlancompile(org.apache.flink.api.common.Plan program)Translates the given program to an OptimizedPlan, where all nodes have their local strategy assigned and all channels have a shipping strategy assigned.static List<DataSinkNode>createPreOptimizedPlan(org.apache.flink.api.common.Plan program)This function performs only the first step to the compilation process - the creation of the optimizer representation of the plan.intgetDefaultParallelism()voidsetDefaultParallelism(int defaultParallelism)
-
-
-
Field Detail
-
HINT_SHIP_STRATEGY
public static final String HINT_SHIP_STRATEGY
Compiler hint key for the input channel's shipping strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which shipping strategy to use for the input channel. If the operator has two input channels, the shipping strategy is applied to both input channels.- See Also:
- Constant Field Values
-
HINT_SHIP_STRATEGY_FIRST_INPUT
public static final String HINT_SHIP_STRATEGY_FIRST_INPUT
Compiler hint key for the first input channel's shipping strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which shipping strategy to use for the first input channel. Only applicable to operators with two inputs.- See Also:
- Constant Field Values
-
HINT_SHIP_STRATEGY_SECOND_INPUT
public static final String HINT_SHIP_STRATEGY_SECOND_INPUT
Compiler hint key for the second input channel's shipping strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which shipping strategy to use for the second input channel. Only applicable to operators with two inputs.- See Also:
- Constant Field Values
-
HINT_SHIP_STRATEGY_FORWARD
public static final String HINT_SHIP_STRATEGY_FORWARD
Value for the shipping strategy compiler hint that enforces a Forward strategy on the input channel, i.e. no redistribution of any kind.
-
HINT_SHIP_STRATEGY_REPARTITION
public static final String HINT_SHIP_STRATEGY_REPARTITION
Value for the shipping strategy compiler hint that enforces a random repartition strategy.
-
HINT_SHIP_STRATEGY_REPARTITION_HASH
public static final String HINT_SHIP_STRATEGY_REPARTITION_HASH
Value for the shipping strategy compiler hint that enforces a hash-partition strategy.
-
HINT_SHIP_STRATEGY_REPARTITION_RANGE
public static final String HINT_SHIP_STRATEGY_REPARTITION_RANGE
Value for the shipping strategy compiler hint that enforces a range-partition strategy.
-
HINT_SHIP_STRATEGY_BROADCAST
public static final String HINT_SHIP_STRATEGY_BROADCAST
Value for the shipping strategy compiler hint that enforces a broadcast strategy on the input channel.
-
HINT_LOCAL_STRATEGY
public static final String HINT_LOCAL_STRATEGY
Compiler hint key for the operator's local strategy. This String is a key to the operator's stub parameters. The corresponding value tells the compiler which local strategy to use to process the data inside one partition.This hint is ignored by operators that do not have a local strategy (such as Map), or by operators that have no choice in their local strategy (such as Cross).
- See Also:
- Constant Field Values
-
HINT_LOCAL_STRATEGY_SORT
public static final String HINT_LOCAL_STRATEGY_SORT
Value for the local strategy compiler hint that enforces a sort based local strategy. For example, a Reduce operator will sort the data to group it.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_COMBINING_SORT
public static final String HINT_LOCAL_STRATEGY_COMBINING_SORT
Value for the local strategy compiler hint that enforces a sort based local strategy. During sorting a combine method is repeatedly applied to reduce the data volume. For example, a Reduce operator will sort the data to group it.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_SORT_BOTH_MERGE
public static final String HINT_LOCAL_STRATEGY_SORT_BOTH_MERGE
Value for the local strategy compiler hint that enforces a sort merge based local strategy on both inputs with subsequent merging of inputs. For example, a Match or CoGroup operator will use a sort-merge strategy to find pairs of matching keys.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_SORT_FIRST_MERGE
public static final String HINT_LOCAL_STRATEGY_SORT_FIRST_MERGE
Value for the local strategy compiler hint that enforces a sort merge based local strategy. The first input is sorted, the second input is assumed to be sorted. After sorting both inputs are merged. For example, a Match or CoGroup operator will use a sort-merge strategy to find pairs of matching keys.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_SORT_SECOND_MERGE
public static final String HINT_LOCAL_STRATEGY_SORT_SECOND_MERGE
Value for the local strategy compiler hint that enforces a sort merge based local strategy. The second input is sorted, the first input is assumed to be sorted. After sorting both inputs are merged. For example, a Match or CoGroup operator will use a sort-merge strategy to find pairs of matching keys.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_MERGE
public static final String HINT_LOCAL_STRATEGY_MERGE
Value for the local strategy compiler hint that enforces a merge based local strategy. Both inputs are assumed to be sorted and are merged. For example, a Match or CoGroup operator will use a merge strategy to find pairs of matching keys.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_HASH_BUILD_FIRST
public static final String HINT_LOCAL_STRATEGY_HASH_BUILD_FIRST
Value for the local strategy compiler hint that enforces a hash based local strategy. For example, a Match operator will use a hybrid-hash-join strategy to find pairs of matching keys. The first input will be used to build the hash table, the second input will be used to probe the table.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_HASH_BUILD_SECOND
public static final String HINT_LOCAL_STRATEGY_HASH_BUILD_SECOND
Value for the local strategy compiler hint that enforces a hash based local strategy. For example, a Match operator will use a hybrid-hash-join strategy to find pairs of matching keys. The second input will be used to build the hash table, the first input will be used to probe the table.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_FIRST
public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_FIRST
Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the first input in the outer-loop of the nested loops. Hence, the data of the first input will be is streamed though, while the data of the second input is stored on disk and repeatedly read.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_SECOND
public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_STREAMED_OUTER_SECOND
Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the second input in the outer-loop of the nested loops. Hence, the data of the second input will be is streamed though, while the data of the first input is stored on disk and repeatedly read.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_FIRST
public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_FIRST
Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the first input in the outer-loop of the nested loops. Further more, the first input, being the outer side, will be processed in blocks, and for each block, the second input, being the inner side, will read repeatedly from disk.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_SECOND
public static final String HINT_LOCAL_STRATEGY_NESTEDLOOP_BLOCKED_OUTER_SECOND
Value for the local strategy compiler hint that chooses the outer side of the nested-loop local strategy. A Cross operator will process the data of the second input in the outer-loop of the nested loops. Further more, the second input, being the outer side, will be processed in blocks, and for each block, the first input, being the inner side, will read repeatedly from disk.- See Also:
HINT_LOCAL_STRATEGY, Constant Field Values
-
LOG
public static final org.slf4j.Logger LOG
The log handle that is used by the compiler to log messages.
-
-
Constructor Detail
-
Optimizer
public Optimizer(org.apache.flink.configuration.Configuration config)
Creates a new optimizer instance. The optimizer has no access to statistics about the inputs and can hence not determine any properties. It will perform all optimization with unknown sizes and hence use only the heuristic cost functions, which result in the selection of the most robust execution strategies.
-
Optimizer
public Optimizer(DataStatistics stats, org.apache.flink.configuration.Configuration config)
Creates a new optimizer instance that uses the statistics object to determine properties about the input. Given those statistics, the optimizer can make better choices for the execution strategies.- Parameters:
stats- The statistics to be used to determine the input properties.
-
Optimizer
public Optimizer(CostEstimator estimator, org.apache.flink.configuration.Configuration config)
Creates a new optimizer instance. The optimizer has no access to statistics about the inputs and can hence not determine any properties. It will perform all optimization with unknown sizes and hence use only the heuristic cost functions, which result in the selection of the most robust execution strategies.The optimizer uses the given cost estimator to compute the costs of the individual operations.
- Parameters:
estimator- The cost estimator to use to cost the individual operations.
-
Optimizer
public Optimizer(DataStatistics stats, CostEstimator estimator, org.apache.flink.configuration.Configuration config)
Creates a new optimizer instance that uses the statistics object to determine properties about the input. Given those statistics, the optimizer can make better choices for the execution strategies.The optimizer uses the given cost estimator to compute the costs of the individual operations.
- Parameters:
stats- The statistics to be used to determine the input properties.estimator- The CostEstimator to use to cost the individual operations.
-
-
Method Detail
-
getDefaultParallelism
public int getDefaultParallelism()
-
setDefaultParallelism
public void setDefaultParallelism(int defaultParallelism)
-
compile
public OptimizedPlan compile(org.apache.flink.api.common.Plan program) throws CompilerException
Translates the given program to an OptimizedPlan, where all nodes have their local strategy assigned and all channels have a shipping strategy assigned.For more details on the optimization phase, see the comments for
compile(org.apache.flink.api.common.Plan, org.apache.flink.optimizer.postpass.OptimizerPostPass).- Parameters:
program- The program to be translated.- Returns:
- The optimized plan.
- Throws:
CompilerException- Thrown, if the plan is invalid or the optimizer encountered an inconsistent situation during the compilation process.
-
createPreOptimizedPlan
public static List<DataSinkNode> createPreOptimizedPlan(org.apache.flink.api.common.Plan program)
This function performs only the first step to the compilation process - the creation of the optimizer representation of the plan. No estimations or enumerations of alternatives are done here.- Parameters:
program- The plan to generate the optimizer representation for.- Returns:
- The optimizer representation of the plan, as a collection of all data sinks from the plan can be traversed.
-
-