Class HashJoinOperator

  • All Implemented Interfaces:
    Serializable, org.apache.flink.api.common.state.CheckpointListener, org.apache.flink.streaming.api.operators.BoundedMultiInput, org.apache.flink.streaming.api.operators.InputSelectable, org.apache.flink.streaming.api.operators.KeyContext, org.apache.flink.streaming.api.operators.KeyContextHandler, org.apache.flink.streaming.api.operators.SetupableStreamOperator<org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.CheckpointedStreamOperator, org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.YieldingOperator<org.apache.flink.table.data.RowData>

    public abstract class HashJoinOperator
    extends TableStreamOperator<org.apache.flink.table.data.RowData>
    implements org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData>, org.apache.flink.streaming.api.operators.BoundedMultiInput, org.apache.flink.streaming.api.operators.InputSelectable
    Hash join base operator.

    The join operator implements the logic of a join operator at runtime. It uses a hybrid-hash-join internally to match the records with equal key. The build side of the hash is the first input of the match. It support all join type in HashJoinType.

    Note: In order to solve the problem of data skew, or too much data in the hash table, the fallback to sort merge join mechanism is introduced here. If some partitions are spilled to disk more than three times in the process of hash join, it will fallback to sort merge join by default to improve stability. In the future, we will support more flexible adaptive hash join strategy, for example, in the process of building a hash table, if the size of data written to disk reaches a certain threshold, fallback to sort merge join in advance.

    See Also:
    Serialized Form
    • Field Summary

      • Fields inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator

        chainingStrategy, config, lastRecordAttributes1, lastRecordAttributes2, latencyStats, metrics, output, processingTimeService, stateHandler, stateKeySelector1, stateKeySelector2, timeServiceManager
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      void close()  
      void endInput​(int inputId)  
      abstract void join​(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter, org.apache.flink.table.data.RowData probeRow)  
      static HashJoinOperator newHashJoinOperator​(HashJoinType type, boolean leftIsBuild, boolean compressionEnable, int compressionBlockSize, GeneratedJoinCondition condFuncCode, boolean reverseJoinFunction, boolean[] filterNullKeys, GeneratedProjection buildProjectionCode, GeneratedProjection probeProjectionCode, boolean tryDistinctBuildRow, int buildRowSize, long buildRowCount, long probeRowCount, org.apache.flink.table.types.logical.RowType keyType, SortMergeJoinFunction sortMergeJoinFunction)  
      org.apache.flink.streaming.api.operators.InputSelection nextSelection()  
      void open()  
      void processElement1​(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)  
      void processElement2​(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)  
      • Methods inherited from class org.apache.flink.streaming.api.operators.AbstractStreamOperator

        finish, getChainingStrategy, getContainingTask, getCurrentKey, getExecutionConfig, getInternalTimerService, getKeyedStateBackend, getKeyedStateStore, getMetricGroup, getOperatorConfig, getOperatorID, getOperatorName, getOperatorStateBackend, getOrCreateKeyedState, getPartitionedState, getPartitionedState, getProcessingTimeService, getRuntimeContext, getStateKeySelector1, getStateKeySelector2, getTimeServiceManager, getUserCodeClassloader, hasKeyContext1, hasKeyContext2, initializeState, initializeState, isUsingCustomRawKeyedState, notifyCheckpointAborted, notifyCheckpointComplete, prepareSnapshotPreBarrier, processLatencyMarker, processLatencyMarker1, processLatencyMarker2, processRecordAttributes, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus, processWatermarkStatus1, processWatermarkStatus2, reportOrForwardLatencyMarker, setChainingStrategy, setCurrentKey, setKeyContextElement1, setKeyContextElement2, setMailboxExecutor, setProcessingTimeService, setup, snapshotState, snapshotState
      • Methods inherited from interface org.apache.flink.api.common.state.CheckpointListener

        notifyCheckpointAborted, notifyCheckpointComplete
      • Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContext

        getCurrentKey, setCurrentKey
      • Methods inherited from interface org.apache.flink.streaming.api.operators.KeyContextHandler

        hasKeyContext
      • Methods inherited from interface org.apache.flink.streaming.api.operators.StreamOperator

        finish, getMetricGroup, getOperatorAttributes, getOperatorID, initializeState, prepareSnapshotPreBarrier, setKeyContextElement1, setKeyContextElement2, snapshotState
      • Methods inherited from interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator

        processLatencyMarker1, processLatencyMarker2, processRecordAttributes1, processRecordAttributes2, processWatermark1, processWatermark2, processWatermarkStatus1, processWatermarkStatus2
    • Method Detail

      • open

        public void open()
                  throws Exception
        Specified by:
        open in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>
        Overrides:
        open in class TableStreamOperator<org.apache.flink.table.data.RowData>
        Throws:
        Exception
      • processElement1

        public void processElement1​(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
                             throws Exception
        Specified by:
        processElement1 in interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData>
        Throws:
        Exception
      • processElement2

        public void processElement2​(org.apache.flink.streaming.runtime.streamrecord.StreamRecord<org.apache.flink.table.data.RowData> element)
                             throws Exception
        Specified by:
        processElement2 in interface org.apache.flink.streaming.api.operators.TwoInputStreamOperator<org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData>
        Throws:
        Exception
      • nextSelection

        public org.apache.flink.streaming.api.operators.InputSelection nextSelection()
        Specified by:
        nextSelection in interface org.apache.flink.streaming.api.operators.InputSelectable
      • endInput

        public void endInput​(int inputId)
                      throws Exception
        Specified by:
        endInput in interface org.apache.flink.streaming.api.operators.BoundedMultiInput
        Throws:
        Exception
      • join

        public abstract void join​(RowIterator<org.apache.flink.table.data.binary.BinaryRowData> buildIter,
                                  org.apache.flink.table.data.RowData probeRow)
                           throws Exception
        Throws:
        Exception
      • close

        public void close()
                   throws Exception
        Specified by:
        close in interface org.apache.flink.streaming.api.operators.StreamOperator<org.apache.flink.table.data.RowData>
        Overrides:
        close in class org.apache.flink.streaming.api.operators.AbstractStreamOperator<org.apache.flink.table.data.RowData>
        Throws:
        Exception
      • newHashJoinOperator

        public static HashJoinOperator newHashJoinOperator​(HashJoinType type,
                                                           boolean leftIsBuild,
                                                           boolean compressionEnable,
                                                           int compressionBlockSize,
                                                           GeneratedJoinCondition condFuncCode,
                                                           boolean reverseJoinFunction,
                                                           boolean[] filterNullKeys,
                                                           GeneratedProjection buildProjectionCode,
                                                           GeneratedProjection probeProjectionCode,
                                                           boolean tryDistinctBuildRow,
                                                           int buildRowSize,
                                                           long buildRowCount,
                                                           long probeRowCount,
                                                           org.apache.flink.table.types.logical.RowType keyType,
                                                           SortMergeJoinFunction sortMergeJoinFunction)