Interface FileMergingSnapshotManager

  • All Superinterfaces:
    AutoCloseable, Closeable
    All Known Implementing Classes:
    FileMergingSnapshotManagerBase, WithinCheckpointFileMergingSnapshotManager

    public interface FileMergingSnapshotManager
    extends Closeable
    FileMergingSnapshotManager provides an interface to manage files and meta information for checkpoint files with merging checkpoint files enabled. It manages the files for ONE single task in TM, including all subtasks of this single task that is running in this TM. There is one FileMergingSnapshotManager for each job per task manager.

    TODO (FLINK-32075): leverage checkpoint notification to delete logical files.

    • Method Detail

      • initFileSystem

        void initFileSystem​(org.apache.flink.core.fs.FileSystem fileSystem,
                            org.apache.flink.core.fs.Path checkpointBaseDir,
                            org.apache.flink.core.fs.Path sharedStateDir,
                            org.apache.flink.core.fs.Path taskOwnedStateDir,
                            int writeBufferSize)
                     throws IllegalArgumentException
        Initialize the file system, recording the checkpoint path the manager should work with.
         The layout of checkpoint directory:
         /user-defined-checkpoint-dir
             /{job-id} (checkpointBaseDir)
                 |
                 + --shared/
                     |
                     + --subtask-1/
                         + -- merged shared state files
                     + --subtask-2/
                         + -- merged shared state files
                 + --taskowned/
                     + -- merged private state files
                 + --chk-1/
                 + --chk-2/
                 + --chk-3/
         

        The reason why initializing directories in this method instead of the constructor is that the FileMergingSnapshotManager itself belongs to the TaskStateManager, which is initialized when receiving a task, while the base directories for checkpoint are created by FsCheckpointStorageAccess when the state backend initializes per subtask. After the checkpoint directories are initialized, the managed subdirectories are initialized here.

        Note: This method may be called several times, the implementation should ensure idempotency, and throw IllegalArgumentException when any of the path in params change across function calls.

        Parameters:
        fileSystem - The filesystem to write to.
        checkpointBaseDir - The base directory for checkpoints.
        sharedStateDir - The directory for shared checkpoint data.
        taskOwnedStateDir - The name of the directory for state not owned/released by the master, but by the TaskManagers.
        writeBufferSize - The buffer size for writing files to the file system.
        Throws:
        IllegalArgumentException - thrown if these three paths are not deterministic across calls.
      • createCheckpointStateOutputStream

        FileMergingCheckpointStateOutputStream createCheckpointStateOutputStream​(FileMergingSnapshotManager.SubtaskKey subtaskKey,
                                                                                 long checkpointId,
                                                                                 CheckpointedStateScope scope)
        Create a new FileMergingCheckpointStateOutputStream. According to the file merging strategy, the streams returned by multiple calls to this function may share the same underlying physical file, and each stream writes to a segment of the physical file.
        Parameters:
        subtaskKey - The subtask key identifying the subtask.
        checkpointId - ID of the checkpoint.
        scope - The state's scope, whether it is exclusive or shared.
        Returns:
        An output stream that writes state for the given checkpoint.