Package org.apache.flink.api.common.io
Class BinaryInputFormat<T>
- java.lang.Object
-
- org.apache.flink.api.common.io.RichInputFormat<OT,FileInputSplit>
-
- org.apache.flink.api.common.io.FileInputFormat<T>
-
- org.apache.flink.api.common.io.BinaryInputFormat<T>
-
- All Implemented Interfaces:
Serializable,CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>,InputFormat<T,FileInputSplit>,InputSplitSource<FileInputSplit>
- Direct Known Subclasses:
SerializedInputFormat
@Public public abstract class BinaryInputFormat<T> extends FileInputFormat<T> implements CheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>
Base class for all input formats that use blocks of fixed size. The input splits are aligned to these blocks, meaning that each split will consist of one block. Without configuration, these block sizes equal the native block sizes of the HDFS.A block will contain a
BlockInfoat the end of the block. There, the reader can find some statistics about the split currently being read, that will help correctly parse the contents of the block.- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected classBinaryInputFormat.BlockBasedInputReads the content of a block of data.-
Nested classes/interfaces inherited from class org.apache.flink.api.common.io.FileInputFormat
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
-
-
Field Summary
Fields Modifier and Type Field Description static StringBLOCK_SIZE_PARAMETER_KEYThe config parameter which defines the fixed length of a record.static longNATIVE_BLOCK_SIZE-
Fields inherited from class org.apache.flink.api.common.io.FileInputFormat
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
-
-
Constructor Summary
Constructors Constructor Description BinaryInputFormat()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidconfigure(Configuration parameters)Configures the file input format by reading the file path from the configuration.BlockInfocreateBlockInfo()FileInputSplit[]createInputSplits(int minNumSplits)Computes the input splits for the file.protected org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatisticscreateStatistics(List<FileStatus> files, FileInputFormat.FileBaseStatistics stats)Fill in the statistics.protected abstract Tdeserialize(T reuse, DataInputView dataInput)longgetBlockSize()Tuple2<Long,Long>getCurrentState()Returns the split currently being read, along with its current state.protected List<FileStatus>getFiles()protected FileInputSplit[]getInputSplits()org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatisticsgetStatistics(BaseStatistics cachedStats)Obtains basic file statistics containing only file size.TnextRecord(T record)Reads the next record from the input.voidopen(FileInputSplit split)Opens an input stream to the file defined in the input format.booleanreachedEnd()Method used to check if the end of the input is reached.voidreopen(FileInputSplit split, Tuple2<Long,Long> state)Restores the state of a parallel instance reading from anInputFormat.voidsetBlockSize(long blockSize)-
Methods inherited from class org.apache.flink.api.common.io.FileInputFormat
acceptFile, close, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toString
-
Methods inherited from class org.apache.flink.api.common.io.RichInputFormat
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
-
-
-
-
Field Detail
-
BLOCK_SIZE_PARAMETER_KEY
public static final String BLOCK_SIZE_PARAMETER_KEY
The config parameter which defines the fixed length of a record.- See Also:
- Constant Field Values
-
NATIVE_BLOCK_SIZE
public static final long NATIVE_BLOCK_SIZE
- See Also:
- Constant Field Values
-
-
Method Detail
-
configure
public void configure(Configuration parameters)
Description copied from class:FileInputFormatConfigures the file input format by reading the file path from the configuration.- Specified by:
configurein interfaceInputFormat<T,FileInputSplit>- Overrides:
configurein classFileInputFormat<T>- Parameters:
parameters- The configuration with all parameters (note: not the Flink config but the TaskConfig).- See Also:
InputFormat.configure(org.apache.flink.configuration.Configuration)
-
setBlockSize
public void setBlockSize(long blockSize)
-
getBlockSize
public long getBlockSize()
-
createInputSplits
public FileInputSplit[] createInputSplits(int minNumSplits) throws IOException
Description copied from class:FileInputFormatComputes the input splits for the file. By default, one file block is one split. If more splits are requested than blocks are available, then a split may be a fraction of a block and splits may cross block boundaries.- Specified by:
createInputSplitsin interfaceInputFormat<T,FileInputSplit>- Specified by:
createInputSplitsin interfaceInputSplitSource<T>- Overrides:
createInputSplitsin classFileInputFormat<T>- Parameters:
minNumSplits- The minimum desired number of file splits.- Returns:
- The computed file splits.
- Throws:
IOException- See Also:
InputFormat.createInputSplits(int)
-
getFiles
protected List<FileStatus> getFiles() throws IOException
- Throws:
IOException
-
getStatistics
public org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics getStatistics(BaseStatistics cachedStats)
Description copied from class:FileInputFormatObtains basic file statistics containing only file size. If the input is a directory, then the size is the sum of all contained files.- Specified by:
getStatisticsin interfaceInputFormat<T,FileInputSplit>- Overrides:
getStatisticsin classFileInputFormat<T>- Parameters:
cachedStats- The statistics that were cached. May be null.- Returns:
- The base statistics for the input, or null, if not available.
- See Also:
InputFormat.getStatistics(org.apache.flink.api.common.io.statistics.BaseStatistics)
-
getInputSplits
protected FileInputSplit[] getInputSplits() throws IOException
- Throws:
IOException
-
createBlockInfo
public BlockInfo createBlockInfo()
-
createStatistics
protected org.apache.flink.api.common.io.BinaryInputFormat.SequentialStatistics createStatistics(List<FileStatus> files, FileInputFormat.FileBaseStatistics stats) throws IOException
Fill in the statistics. The last modification time and the total input size are prefilled.- Parameters:
files- The files that are associated with this block input format.stats- The pre-filled statistics.- Throws:
IOException
-
open
public void open(FileInputSplit split) throws IOException
Description copied from class:FileInputFormatOpens an input stream to the file defined in the input format. The stream is positioned at the beginning of the given split.The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
- Specified by:
openin interfaceInputFormat<T,FileInputSplit>- Overrides:
openin classFileInputFormat<T>- Parameters:
split- The split to be opened.- Throws:
IOException- Thrown, if the spit could not be opened due to an I/O problem.
-
reachedEnd
public boolean reachedEnd() throws IOExceptionDescription copied from interface:InputFormatMethod used to check if the end of the input is reached.When this method is called, the input format it guaranteed to be opened.
- Specified by:
reachedEndin interfaceInputFormat<T,FileInputSplit>- Returns:
- True if the end is reached, otherwise false.
- Throws:
IOException- Thrown, if an I/O error occurred.
-
nextRecord
public T nextRecord(T record) throws IOException
Description copied from interface:InputFormatReads the next record from the input.When this method is called, the input format it guaranteed to be opened.
- Specified by:
nextRecordin interfaceInputFormat<T,FileInputSplit>- Parameters:
record- Object that may be reused.- Returns:
- Read record.
- Throws:
IOException- Thrown, if an I/O error occurred.
-
deserialize
protected abstract T deserialize(T reuse, DataInputView dataInput) throws IOException
- Throws:
IOException
-
getCurrentState
@PublicEvolving public Tuple2<Long,Long> getCurrentState() throws IOException
Description copied from interface:CheckpointableInputFormatReturns the split currently being read, along with its current state. This will be used to restore the state of the reading channel when recovering from a task failure. In the case of a simple text file, the state can correspond to the last read offset in the split.- Specified by:
getCurrentStatein interfaceCheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>- Returns:
- The state of the channel.
- Throws:
IOException- Thrown if the creation of the state object failed.
-
reopen
@PublicEvolving public void reopen(FileInputSplit split, Tuple2<Long,Long> state) throws IOException
Description copied from interface:CheckpointableInputFormatRestores the state of a parallel instance reading from anInputFormat. This is necessary when recovering from a task failure. When this method is called, the input format it guaranteed to be configured.NOTE: The caller has to make sure that the provided split is the one to whom the state belongs.
- Specified by:
reopenin interfaceCheckpointableInputFormat<FileInputSplit,Tuple2<Long,Long>>- Parameters:
split- The split to be opened.state- The state from which to start from. This can contain the offset, but also other data, depending on the input format.- Throws:
IOException
-
-