Class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>

  • All Implemented Interfaces:
    Serializable, org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>, org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,​SplitT>, org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat

    public class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    extends ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,​SplitT>
    implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
    A ParquetVectorizedInputFormat to provide RowData iterator. Using ColumnarRowData to provide a row view of column batch.
    See Also:
    Serialized Form
    • Constructor Detail

      • ParquetColumnarRowInputFormat

        public ParquetColumnarRowInputFormat​(org.apache.hadoop.conf.Configuration hadoopConfig,
                                             org.apache.flink.table.types.logical.RowType projectedType,
                                             org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo,
                                             int batchSize,
                                             boolean isUtcTimestamp,
                                             boolean isCaseSensitive)
        Constructor to create parquet format without extra fields.
    • Method Detail

      • numBatchesToCirculate

        protected int numBatchesToCirculate​(org.apache.flink.configuration.Configuration config)
        Overrides:
        numBatchesToCirculate in class ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
      • createReaderBatch

        protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData> createReaderBatch​(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors,
                                                                                                                         org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch,
                                                                                                                         org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler)
        Specified by:
        createReaderBatch in class ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
        Parameters:
        writableVectors - vectors to be write
        columnarBatch - vectors to be read
        recycler - batch recycler
      • getProducedType

        public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
        Specified by:
        getProducedType in interface org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
        Specified by:
        getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
      • reportStatistics

        public org.apache.flink.table.plan.stats.TableStats reportStatistics​(List<org.apache.flink.core.fs.Path> files,
                                                                             org.apache.flink.table.types.DataType producedDataType)
        Specified by:
        reportStatistics in interface org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
      • createPartitionedFormat

        public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> ParquetColumnarRowInputFormat<SplitT> createPartitionedFormat​(org.apache.hadoop.conf.Configuration hadoopConfig,
                                                                                                                                                         org.apache.flink.table.types.logical.RowType producedRowType,
                                                                                                                                                         org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo,
                                                                                                                                                         List<String> partitionKeys,
                                                                                                                                                         org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor,
                                                                                                                                                         int batchSize,
                                                                                                                                                         boolean isUtcTimestamp,
                                                                                                                                                         boolean isCaseSensitive)
        Create a partitioned ParquetColumnarRowInputFormat, the partition columns can be generated by Path.