Class DeduplicateFunctionHelper


  • public class DeduplicateFunctionHelper
    extends Object
    Utility for deduplicate function.

    TODO utilize the respective helper classes that inherit from an abstract deduplicate function helper in each deduplicate function.

    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void checkInsertOnly​(org.apache.flink.table.data.RowData currentRow)
      check message should be insert only.
      static boolean isDuplicate​(org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, int rowtimeIndex, boolean keepLastRow)
      Returns current row is duplicate row or not compared to previous row.
      static void processFirstRowOnProcTime​(org.apache.flink.table.data.RowData currentRow, org.apache.flink.api.common.state.ValueState<Boolean> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
      Processes element to deduplicate on keys with process time semantic, sends current element if it is first row.
      static void processLastRowOnChangelog​(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser)
      Processes element to deduplicate on keys, sends current element as last row, retracts previous element if needed.
      static void processLastRowOnProcTime​(org.apache.flink.table.data.RowData currentRow, boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean isStateTtlEnabled, RecordEqualiser equaliser)
      Processes element to deduplicate on keys with process time semantic, sends current element as last row, retracts previous element if needed.
      static void updateDeduplicateResult​(boolean generateUpdateBefore, boolean generateInsert, org.apache.flink.table.data.RowData preRow, org.apache.flink.table.data.RowData currentRow, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
      Collect the updated result for duplicate row.
    • Method Detail

      • processLastRowOnProcTime

        public static void processLastRowOnProcTime​(org.apache.flink.table.data.RowData currentRow,
                                                    boolean generateUpdateBefore,
                                                    boolean generateInsert,
                                                    org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state,
                                                    org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out,
                                                    boolean isStateTtlEnabled,
                                                    RecordEqualiser equaliser)
                                             throws Exception
        Processes element to deduplicate on keys with process time semantic, sends current element as last row, retracts previous element if needed.
        Parameters:
        currentRow - latest row received by deduplicate function
        generateUpdateBefore - whether need to send UPDATE_BEFORE message for updates
        state - state of function, null if generateUpdateBefore is false
        out - underlying collector
        isStateTtlEnabled - whether state ttl is disabled
        equaliser - the record equaliser used to equal RowData.
        Throws:
        Exception
      • processLastRowOnChangelog

        public static void processLastRowOnChangelog​(org.apache.flink.table.data.RowData currentRow,
                                                     boolean generateUpdateBefore,
                                                     org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state,
                                                     org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out,
                                                     boolean isStateTtlEnabled,
                                                     RecordEqualiser equaliser)
                                              throws Exception
        Processes element to deduplicate on keys, sends current element as last row, retracts previous element if needed.

        Note: we don't support stateless mode yet. Because this is not safe for Kafka tombstone messages which doesn't contain full content. This can be a future improvement if the downstream (e.g. sink) doesn't require full content for DELETE messages.

        Parameters:
        currentRow - latest row received by deduplicate function
        generateUpdateBefore - whether need to send UPDATE_BEFORE message for updates
        state - state of function
        out - underlying collector
        Throws:
        Exception
      • processFirstRowOnProcTime

        public static void processFirstRowOnProcTime​(org.apache.flink.table.data.RowData currentRow,
                                                     org.apache.flink.api.common.state.ValueState<Boolean> state,
                                                     org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
                                              throws Exception
        Processes element to deduplicate on keys with process time semantic, sends current element if it is first row.
        Parameters:
        currentRow - latest row received by deduplicate function
        state - state of function
        out - underlying collector
        Throws:
        Exception
      • updateDeduplicateResult

        public static void updateDeduplicateResult​(boolean generateUpdateBefore,
                                                   boolean generateInsert,
                                                   org.apache.flink.table.data.RowData preRow,
                                                   org.apache.flink.table.data.RowData currentRow,
                                                   org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
        Collect the updated result for duplicate row.
        Parameters:
        generateUpdateBefore - flag to generate UPDATE_BEFORE message or not
        generateInsert - flag to generate INSERT message or not
        preRow - previous row under the key
        currentRow - current row under the key which is the duplicate row
        out - underlying collector
      • isDuplicate

        public static boolean isDuplicate​(org.apache.flink.table.data.RowData preRow,
                                          org.apache.flink.table.data.RowData currentRow,
                                          int rowtimeIndex,
                                          boolean keepLastRow)
        Returns current row is duplicate row or not compared to previous row.
      • checkInsertOnly

        public static void checkInsertOnly​(org.apache.flink.table.data.RowData currentRow)
        check message should be insert only.