Class RowTimeDeduplicateFunction

  • All Implemented Interfaces:
    Serializable, org.apache.flink.api.common.functions.Function, org.apache.flink.api.common.functions.RichFunction

    public class RowTimeDeduplicateFunction
    extends org.apache.flink.streaming.api.functions.KeyedProcessFunction<K,​IN,​OUT>
    This function is used to deduplicate on keys and keeps only first or last row on row time.
    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.flink.streaming.api.functions.KeyedProcessFunction

        org.apache.flink.streaming.api.functions.KeyedProcessFunction.Context, org.apache.flink.streaming.api.functions.KeyedProcessFunction.OnTimerContext
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected org.apache.flink.api.common.typeutils.TypeSerializer<OUT> serializer  
      protected org.apache.flink.api.common.state.ValueState<T> state  
      protected long stateRetentionTime  
      protected org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInfo  
    • Constructor Summary

      Constructors 
      Constructor Description
      RowTimeDeduplicateFunction​(InternalTypeInfo<org.apache.flink.table.data.RowData> typeInfo, long minRetentionTime, int rowtimeIndex, boolean generateUpdateBefore, boolean generateInsert, boolean keepLastRow)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static void deduplicateOnRowTime​(org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state, org.apache.flink.table.data.RowData currentRow, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out, boolean generateUpdateBefore, boolean generateInsert, int rowtimeIndex, boolean keepLastRow)
      Processes element to deduplicate on keys with row time semantic, sends current element if it is last or first row, retracts previous element if needed.
      void open​(org.apache.flink.configuration.Configuration configure)  
      void processElement​(org.apache.flink.table.data.RowData input, org.apache.flink.streaming.api.functions.KeyedProcessFunction.Context ctx, org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)  
      • Methods inherited from class org.apache.flink.streaming.api.functions.KeyedProcessFunction

        onTimer
      • Methods inherited from class org.apache.flink.api.common.functions.AbstractRichFunction

        close, getIterationRuntimeContext, getRuntimeContext, setRuntimeContext
    • Field Detail

      • typeInfo

        protected final org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInfo
      • stateRetentionTime

        protected final long stateRetentionTime
      • serializer

        protected final org.apache.flink.api.common.typeutils.TypeSerializer<OUT> serializer
      • state

        protected org.apache.flink.api.common.state.ValueState<T> state
    • Constructor Detail

      • RowTimeDeduplicateFunction

        public RowTimeDeduplicateFunction​(InternalTypeInfo<org.apache.flink.table.data.RowData> typeInfo,
                                          long minRetentionTime,
                                          int rowtimeIndex,
                                          boolean generateUpdateBefore,
                                          boolean generateInsert,
                                          boolean keepLastRow)
    • Method Detail

      • processElement

        public void processElement​(org.apache.flink.table.data.RowData input,
                                   org.apache.flink.streaming.api.functions.KeyedProcessFunction.Context ctx,
                                   org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out)
                            throws Exception
        Specified by:
        processElement in class org.apache.flink.streaming.api.functions.KeyedProcessFunction<org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData,​org.apache.flink.table.data.RowData>
        Throws:
        Exception
      • deduplicateOnRowTime

        public static void deduplicateOnRowTime​(org.apache.flink.api.common.state.ValueState<org.apache.flink.table.data.RowData> state,
                                                org.apache.flink.table.data.RowData currentRow,
                                                org.apache.flink.util.Collector<org.apache.flink.table.data.RowData> out,
                                                boolean generateUpdateBefore,
                                                boolean generateInsert,
                                                int rowtimeIndex,
                                                boolean keepLastRow)
                                         throws Exception
        Processes element to deduplicate on keys with row time semantic, sends current element if it is last or first row, retracts previous element if needed.
        Parameters:
        state - state of function
        currentRow - latest row received by deduplicate function
        out - underlying collector
        generateUpdateBefore - flag to generate UPDATE_BEFORE message or not
        generateInsert - flag to gennerate INSERT message or not
        rowtimeIndex - the index of rowtime field
        keepLastRow - flag to keep last row or keep first row
        Throws:
        Exception
      • open

        public void open​(org.apache.flink.configuration.Configuration configure)
                  throws Exception
        Specified by:
        open in interface org.apache.flink.api.common.functions.RichFunction
        Overrides:
        open in class org.apache.flink.api.common.functions.AbstractRichFunction
        Throws:
        Exception