Class ParquetVectorizedInputFormat<T,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>

  • All Implemented Interfaces:
    Serializable, org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>, org.apache.flink.connector.file.src.reader.BulkFormat<T,​SplitT>
    Direct Known Subclasses:
    ParquetColumnarRowInputFormat

    public abstract class ParquetVectorizedInputFormat<T,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    extends Object
    implements org.apache.flink.connector.file.src.reader.BulkFormat<T,​SplitT>
    Parquet BulkFormat that reads data from the file to VectorizedColumnBatch in vectorized mode.
    See Also:
    Serialized Form
    • Constructor Detail

      • ParquetVectorizedInputFormat

        public ParquetVectorizedInputFormat​(SerializableConfiguration hadoopConfig,
                                            org.apache.flink.table.types.logical.RowType projectedType,
                                            ColumnBatchFactory<SplitT> batchFactory,
                                            int batchSize,
                                            boolean isUtcTimestamp,
                                            boolean isCaseSensitive)
    • Method Detail

      • createReader

        public org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader createReader​(org.apache.flink.configuration.Configuration config,
                                                                                                        SplitT split)
                                                                                                 throws IOException
        Specified by:
        createReader in interface org.apache.flink.connector.file.src.reader.BulkFormat<T,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
        Throws:
        IOException
      • numBatchesToCirculate

        protected int numBatchesToCirculate​(org.apache.flink.configuration.Configuration config)
      • restoreReader

        public org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.ParquetReader restoreReader​(org.apache.flink.configuration.Configuration config,
                                                                                                         SplitT split)
                                                                                                  throws IOException
        Specified by:
        restoreReader in interface org.apache.flink.connector.file.src.reader.BulkFormat<T,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
        Throws:
        IOException
      • isSplittable

        public boolean isSplittable()
        Specified by:
        isSplittable in interface org.apache.flink.connector.file.src.reader.BulkFormat<T,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
      • createReaderBatch

        protected abstract ParquetVectorizedInputFormat.ParquetReaderBatch<T> createReaderBatch​(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors,
                                                                                                org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch,
                                                                                                org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<T>> recycler)
        Parameters:
        writableVectors - vectors to be write
        columnarBatch - vectors to be read
        recycler - batch recycler