Package org.apache.flink.api.common.io
Class GenericCsvInputFormat<OT>
- java.lang.Object
-
- org.apache.flink.api.common.io.RichInputFormat<OT,FileInputSplit>
-
- org.apache.flink.api.common.io.FileInputFormat<OT>
-
- org.apache.flink.api.common.io.DelimitedInputFormat<OT>
-
- org.apache.flink.api.common.io.GenericCsvInputFormat<OT>
-
- All Implemented Interfaces:
Serializable,CheckpointableInputFormat<FileInputSplit,Long>,InputFormat<OT,FileInputSplit>,InputSplitSource<FileInputSplit>
@Internal public abstract class GenericCsvInputFormat<OT> extends DelimitedInputFormat<OT>
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.api.common.io.FileInputFormat
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
-
-
Field Summary
Fields Modifier and Type Field Description protected intcommentCountprotected byte[]commentPrefixprotected boolean[]fieldIncludedprotected intinvalidLineCountprotected booleanlineDelimiterIsLinebreak-
Fields inherited from class org.apache.flink.api.common.io.DelimitedInputFormat
currBuffer, currLen, currOffset, RECORD_DELIMITER
-
Fields inherited from class org.apache.flink.api.common.io.FileInputFormat
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedGenericCsvInputFormat()protectedGenericCsvInputFormat(Path filePath)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static voidcheckAndCoSort(int[] positions, Class<?>[] types)protected static voidcheckForMonotonousOrder(int[] positions, Class<?>[] types)voidclose()Closes the input by releasing all buffers and closing the file input stream.voidenableQuotedStringParsing(char quoteCharacter)byte[]getCommentPrefix()byte[]getFieldDelimiter()protected FieldParser<?>[]getFieldParsers()protected Class<?>[]getGenericFieldTypes()intgetNumberOfFieldsTotal()intgetNumberOfNonNullFields()protected voidinitializeSplit(FileInputSplit split, Long offset)Initialization method that is called after opening or reopening an input split.booleanisLenient()booleanisSkippingFirstLineAsHeader()protected booleanparseRecord(Object[] holders, byte[] bytes, int offset, int numBytes)voidsetCharset(String charset)Set the name of the character set used for the row delimiter.voidsetCommentPrefix(String commentPrefix)voidsetFieldDelimiter(String delimiter)protected voidsetFieldsGeneric(boolean[] includedMask, Class<?>[] fieldTypes)protected voidsetFieldsGeneric(int[] sourceFieldIndices, Class<?>[] fieldTypes)protected voidsetFieldTypesGeneric(Class<?>... fieldTypes)voidsetLenient(boolean lenient)voidsetSkipFirstLineAsHeader(boolean skipFirstLine)protected intskipFields(byte[] bytes, int startPos, int limit, byte[] delim)booleansupportsMultiPaths()Override this method to supports multiple paths.-
Methods inherited from class org.apache.flink.api.common.io.DelimitedInputFormat
configure, getBufferSize, getCharset, getCurrentState, getDelimiter, getLineLengthLimit, getNumLineSamples, getStatistics, loadConfigParameters, loadGlobalConfigParams, nextRecord, open, reachedEnd, readLine, readRecord, reopen, setBufferSize, setDelimiter, setDelimiter, setDelimiter, setLineLengthLimit, setNumLineSamples
-
Methods inherited from class org.apache.flink.api.common.io.FileInputFormat
acceptFile, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getSupportedCompressionFormats, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
-
Methods inherited from class org.apache.flink.api.common.io.RichInputFormat
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
-
-
-
-
Constructor Detail
-
GenericCsvInputFormat
protected GenericCsvInputFormat()
-
GenericCsvInputFormat
protected GenericCsvInputFormat(Path filePath)
-
-
Method Detail
-
supportsMultiPaths
public boolean supportsMultiPaths()
Description copied from class:FileInputFormatOverride this method to supports multiple paths. When this method will be removed, all FileInputFormats have to support multiple paths.- Overrides:
supportsMultiPathsin classFileInputFormat<OT>- Returns:
- True if the FileInputFormat supports multiple paths, false otherwise.
-
getNumberOfFieldsTotal
public int getNumberOfFieldsTotal()
-
getNumberOfNonNullFields
public int getNumberOfNonNullFields()
-
setCharset
public void setCharset(String charset)
Description copied from class:DelimitedInputFormatSet the name of the character set used for the row delimiter. This is also used by subclasses to interpret field delimiters, comment strings, and for configuringFieldParsers.These fields are interpreted when set. Changing the charset thereafter may cause unexpected results.
- Overrides:
setCharsetin classDelimitedInputFormat<OT>- Parameters:
charset- name of the charset
-
getCommentPrefix
public byte[] getCommentPrefix()
-
setCommentPrefix
public void setCommentPrefix(String commentPrefix)
-
getFieldDelimiter
public byte[] getFieldDelimiter()
-
setFieldDelimiter
public void setFieldDelimiter(String delimiter)
-
isLenient
public boolean isLenient()
-
setLenient
public void setLenient(boolean lenient)
-
isSkippingFirstLineAsHeader
public boolean isSkippingFirstLineAsHeader()
-
setSkipFirstLineAsHeader
public void setSkipFirstLineAsHeader(boolean skipFirstLine)
-
enableQuotedStringParsing
public void enableQuotedStringParsing(char quoteCharacter)
-
getFieldParsers
protected FieldParser<?>[] getFieldParsers()
-
getGenericFieldTypes
protected Class<?>[] getGenericFieldTypes()
-
setFieldTypesGeneric
protected void setFieldTypesGeneric(Class<?>... fieldTypes)
-
setFieldsGeneric
protected void setFieldsGeneric(int[] sourceFieldIndices, Class<?>[] fieldTypes)
-
setFieldsGeneric
protected void setFieldsGeneric(boolean[] includedMask, Class<?>[] fieldTypes)
-
initializeSplit
protected void initializeSplit(FileInputSplit split, Long offset) throws IOException
Description copied from class:DelimitedInputFormatInitialization method that is called after opening or reopening an input split.- Overrides:
initializeSplitin classDelimitedInputFormat<OT>- Parameters:
split- Split that was opened or reopenedoffset- Checkpointed state if the split was reopened- Throws:
IOException
-
close
public void close() throws IOExceptionDescription copied from class:DelimitedInputFormatCloses the input by releasing all buffers and closing the file input stream.- Specified by:
closein interfaceInputFormat<OT,FileInputSplit>- Overrides:
closein classDelimitedInputFormat<OT>- Throws:
IOException- Thrown, if the closing of the file stream causes an I/O error.
-
parseRecord
protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int numBytes) throws ParseException
- Throws:
ParseException
-
skipFields
protected int skipFields(byte[] bytes, int startPos, int limit, byte[] delim)
-
checkAndCoSort
protected static void checkAndCoSort(int[] positions, Class<?>[] types)
-
checkForMonotonousOrder
protected static void checkForMonotonousOrder(int[] positions, Class<?>[] types)
-
-