Class ReplicatingInputFormat<OT,S extends InputSplit>
- java.lang.Object
-
- org.apache.flink.api.common.io.RichInputFormat<OT,S>
-
- org.apache.flink.api.common.io.ReplicatingInputFormat<OT,S>
-
- Type Parameters:
OT- The output type of the wrapped InputFormat.S- The InputSplit type of the wrapped InputFormat.
- All Implemented Interfaces:
Serializable,InputFormat<OT,S>,InputSplitSource<S>
@PublicEvolving public final class ReplicatingInputFormat<OT,S extends InputSplit> extends RichInputFormat<OT,S>
A ReplicatingInputFormat replicates anyInputFormatto all parallel instances of a DataSource, i.e., the full input of the replicated InputFormat is completely processed by each parallel instance of the DataSource. This is done by assigning allInputSplits generated by the replicated InputFormat to each parallel instance.Replicated data can only be used as input for a
InnerJoinOperatorBaseorCrossOperatorBasewith the same parallelism as the DataSource. Before being used as an input to a Join or Cross operator, replicated data might be processed in local pipelines by Map-based operators with the same parallelism as the source. Map-based operators areMapOperatorBase,FlatMapOperatorBase,FilterOperatorBase, andMapPartitionOperatorBase.Replicated DataSources can be used for local join processing (no data shipping) if one input is accessible on all parallel instance of a join and the other input is (randomly) partitioned across all parallel instances.
However, a replicated DataSource is a plan hint that can invalidate a Flink program if not used correctly (see usage instructions above). In such situations, the optimizer is not able to generate a valid execution plan and the program execution will fail.
-
-
Constructor Summary
Constructors Constructor Description ReplicatingInputFormat(InputFormat<OT,S> wrappedIF)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()Method that marks the end of the life-cycle of an input split.voidcloseInputFormat()Closes this InputFormat instance.voidconfigure(Configuration parameters)Configures this input format.S[]createInputSplits(int minNumSplits)Computes the input splits.InputSplitAssignergetInputSplitAssigner(S[] inputSplits)Returns the assigner for the input splits.InputFormat<OT,S>getReplicatedInputFormat()RuntimeContextgetRuntimeContext()BaseStatisticsgetStatistics(BaseStatistics cachedStatistics)Gets the basic statistics from the input described by this format.OTnextRecord(OT reuse)Reads the next record from the input.voidopen(S split)Opens a parallel instance of the input format to work on a split.voidopenInputFormat()Opens this InputFormat instance.booleanreachedEnd()Method used to check if the end of the input is reached.voidsetRuntimeContext(RuntimeContext context)
-
-
-
Constructor Detail
-
ReplicatingInputFormat
public ReplicatingInputFormat(InputFormat<OT,S> wrappedIF)
-
-
Method Detail
-
getReplicatedInputFormat
public InputFormat<OT,S> getReplicatedInputFormat()
-
configure
public void configure(Configuration parameters)
Description copied from interface:InputFormatConfigures this input format. Since input formats are instantiated generically and hence parameterless, this method is the place where the input formats set their basic fields based on configuration values.This method is always called first on a newly instantiated input format.
- Parameters:
parameters- The configuration with all parameters (note: not the Flink config but the TaskConfig).
-
getStatistics
public BaseStatistics getStatistics(BaseStatistics cachedStatistics) throws IOException
Description copied from interface:InputFormatGets the basic statistics from the input described by this format. If the input format does not know how to create those statistics, it may return null. This method optionally gets a cached version of the statistics. The input format may examine them and decide whether it directly returns them without spending effort to re-gather the statistics.When this method is called, the input format is guaranteed to be configured.
- Parameters:
cachedStatistics- The statistics that were cached. May be null.- Returns:
- The base statistics for the input, or null, if not available.
- Throws:
IOException
-
createInputSplits
public S[] createInputSplits(int minNumSplits) throws IOException
Description copied from interface:InputSplitSourceComputes the input splits. The given minimum number of splits is a hint as to how many splits are desired.- Parameters:
minNumSplits- Number of minimal input splits, as a hint.- Returns:
- An array of input splits.
- Throws:
IOException
-
getInputSplitAssigner
public InputSplitAssigner getInputSplitAssigner(S[] inputSplits)
Description copied from interface:InputSplitSourceReturns the assigner for the input splits. Assigner determines which parallel instance of the input format gets which input split.- Returns:
- The input split assigner.
-
open
public void open(S split) throws IOException
Description copied from interface:InputFormatOpens a parallel instance of the input format to work on a split.When this method is called, the input format it guaranteed to be configured.
- Parameters:
split- The split to be opened.- Throws:
IOException- Thrown, if the spit could not be opened due to an I/O problem.
-
reachedEnd
public boolean reachedEnd() throws IOExceptionDescription copied from interface:InputFormatMethod used to check if the end of the input is reached.When this method is called, the input format it guaranteed to be opened.
- Returns:
- True if the end is reached, otherwise false.
- Throws:
IOException- Thrown, if an I/O error occurred.
-
nextRecord
public OT nextRecord(OT reuse) throws IOException
Description copied from interface:InputFormatReads the next record from the input.When this method is called, the input format it guaranteed to be opened.
- Parameters:
reuse- Object that may be reused.- Returns:
- Read record.
- Throws:
IOException- Thrown, if an I/O error occurred.
-
close
public void close() throws IOExceptionDescription copied from interface:InputFormatMethod that marks the end of the life-cycle of an input split. Should be used to close channels and streams and release resources. After this method returns without an error, the input is assumed to be correctly read.When this method is called, the input format it guaranteed to be opened.
- Throws:
IOException- Thrown, if the input could not be closed properly.
-
setRuntimeContext
public void setRuntimeContext(RuntimeContext context)
- Overrides:
setRuntimeContextin classRichInputFormat<OT,S extends InputSplit>
-
getRuntimeContext
public RuntimeContext getRuntimeContext()
- Overrides:
getRuntimeContextin classRichInputFormat<OT,S extends InputSplit>
-
openInputFormat
public void openInputFormat() throws IOExceptionDescription copied from class:RichInputFormatOpens this InputFormat instance. This method is called once per parallel instance. Resources should be allocated in this method. (e.g. database connections, cache, etc.)- Overrides:
openInputFormatin classRichInputFormat<OT,S extends InputSplit>- Throws:
IOException- in case allocating the resources failed.- See Also:
InputFormat
-
closeInputFormat
public void closeInputFormat() throws IOExceptionDescription copied from class:RichInputFormatCloses this InputFormat instance. This method is called once per parallel instance. Resources allocated duringRichInputFormat.openInputFormat()should be closed in this method.- Overrides:
closeInputFormatin classRichInputFormat<OT,S extends InputSplit>- Throws:
IOException- in case closing the resources failed- See Also:
InputFormat
-
-