Class StreamingFileSink<IN>

  • Type Parameters:
    IN - Type of the elements emitted by this sink
    All Implemented Interfaces:
    Serializable, org.apache.flink.api.common.functions.Function, org.apache.flink.api.common.functions.RichFunction, org.apache.flink.api.common.state.CheckpointListener, CheckpointedFunction, SinkFunction<IN>

    @PublicEvolving
    @Deprecated
    public class StreamingFileSink<IN>
    extends RichSinkFunction<IN>
    implements CheckpointedFunction, org.apache.flink.api.common.state.CheckpointListener
    Deprecated.
    Use org.apache.flink.connector.file.sink.FileSink instead.
    Sink that emits its input elements to FileSystem files within buckets. This is integrated with the checkpointing mechanism to provide exactly once semantics.

    When creating the sink a basePath must be specified. The base directory contains one directory for every bucket. The bucket directories themselves contain several part files, with at least one for each parallel subtask of the sink which is writing data to that bucket. These part files contain the actual output data.

    The sink uses a BucketAssigner to determine in which bucket directory each element should be written to inside the base directory. The BucketAssigner can, for example, use time or a property of the element to determine the bucket directory. The default BucketAssigner is a DateTimeBucketAssigner which will create one new bucket every hour. You can specify a custom BucketAssigner using the setBucketAssigner(bucketAssigner) method, after calling forRowFormat(Path, Encoder) or forBulkFormat(Path, BulkWriter.Factory).

    The names of the part files could be defined using OutputFileConfig. This configuration contains a part prefix and a part suffix that will be used with the parallel subtask index of the sink and a rolling counter to determine the file names. For example with a prefix "prefix" and a suffix ".ext", a file named "prefix-1-17.ext" contains the data from subtask 1 of the sink and is the 17th bucket created by that subtask.

    Part files roll based on the user-specified RollingPolicy. By default, a DefaultRollingPolicy is used for row-encoded sink output; a OnCheckpointRollingPolicy is used for bulk-encoded sink output.

    In some scenarios, the open buckets are required to change based on time. In these cases, the user can specify a bucketCheckInterval (by default 1m) and the sink will check periodically and roll the part file if the specified rolling policy says so.

    Part files can be in one of three states: in-progress, pending or finished. The reason for this is how the sink works together with the checkpointing mechanism to provide exactly-once semantics and fault-tolerance. The part file that is currently being written to is in-progress. Once a part file is closed for writing it becomes pending. When a checkpoint is successful the currently pending files will be moved to finished.

    If case of a failure, and in order to guarantee exactly-once semantics, the sink should roll back to the state it had when that last successful checkpoint occurred. To this end, when restoring, the restored files in pending state are transferred into the finished state while any in-progress files are rolled back, so that they do not contain data that arrived after the checkpoint from which we restore.

    See Also:
    Serialized Form
    • Method Detail

      • forRowFormat

        public static <IN> StreamingFileSink.DefaultRowFormatBuilder<IN> forRowFormat​(org.apache.flink.core.fs.Path basePath,
                                                                                      org.apache.flink.api.common.serialization.Encoder<IN> encoder)
        Deprecated.
        Creates the builder for a StreamingFileSink with row-encoding format.
        Type Parameters:
        IN - the type of incoming elements
        Parameters:
        basePath - the base path where all the buckets are going to be created as sub-directories.
        encoder - the Encoder to be used when writing elements in the buckets.
        Returns:
        The builder where the remaining of the configuration parameters for the sink can be configured. In order to instantiate the sink, call StreamingFileSink.RowFormatBuilder.build() after specifying the desired parameters.
      • forBulkFormat

        public static <IN> StreamingFileSink.DefaultBulkFormatBuilder<IN> forBulkFormat​(org.apache.flink.core.fs.Path basePath,
                                                                                        org.apache.flink.api.common.serialization.BulkWriter.Factory<IN> writerFactory)
        Deprecated.
        Creates the builder for a StreamingFileSink with bulk-encoding format.
        Type Parameters:
        IN - the type of incoming elements
        Parameters:
        basePath - the base path where all the buckets are going to be created as sub-directories.
        writerFactory - the BulkWriter.Factory to be used when writing elements in the buckets.
        Returns:
        The builder where the remaining of the configuration parameters for the sink can be configured. In order to instantiate the sink, call StreamingFileSink.BulkFormatBuilder.build() after specifying the desired parameters.
      • initializeState

        public void initializeState​(org.apache.flink.runtime.state.FunctionInitializationContext context)
                             throws Exception
        Deprecated.
        Description copied from interface: CheckpointedFunction
        This method is called when the parallel function instance is created during distributed execution. Functions typically set up their state storing data structures in this method.
        Specified by:
        initializeState in interface CheckpointedFunction
        Parameters:
        context - the context for initializing the operator
        Throws:
        Exception - Thrown, if state could not be created ot restored.
      • notifyCheckpointComplete

        public void notifyCheckpointComplete​(long checkpointId)
                                      throws Exception
        Deprecated.
        Specified by:
        notifyCheckpointComplete in interface org.apache.flink.api.common.state.CheckpointListener
        Throws:
        Exception
      • notifyCheckpointAborted

        public void notifyCheckpointAborted​(long checkpointId)
        Deprecated.
        Specified by:
        notifyCheckpointAborted in interface org.apache.flink.api.common.state.CheckpointListener
      • snapshotState

        public void snapshotState​(org.apache.flink.runtime.state.FunctionSnapshotContext context)
                           throws Exception
        Deprecated.
        Description copied from interface: CheckpointedFunction
        This method is called when a snapshot for a checkpoint is requested. This acts as a hook to the function to ensure that all state is exposed by means previously offered through FunctionInitializationContext when the Function was initialized, or offered now by FunctionSnapshotContext itself.
        Specified by:
        snapshotState in interface CheckpointedFunction
        Parameters:
        context - the context for drawing a snapshot of the operator
        Throws:
        Exception - Thrown, if state could not be created ot restored.
      • invoke

        public void invoke​(IN value,
                           SinkFunction.Context context)
                    throws Exception
        Deprecated.
        Description copied from interface: SinkFunction
        Writes the given value to the sink. This function is called for every record.

        You have to override this method when implementing a SinkFunction, this is a default method for backward compatibility with the old-style method only.

        Specified by:
        invoke in interface SinkFunction<IN>
        Parameters:
        value - The input record.
        context - Additional context about the input record.
        Throws:
        Exception - This method may throw exceptions. Throwing an exception will cause the operation to fail and may trigger recovery.
      • close

        public void close()
                   throws Exception
        Deprecated.
        Specified by:
        close in interface org.apache.flink.api.common.functions.RichFunction
        Overrides:
        close in class org.apache.flink.api.common.functions.AbstractRichFunction
        Throws:
        Exception