Package org.apache.flink.formats.parquet
Class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
- java.lang.Object
-
- org.apache.flink.formats.parquet.ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT>
-
- org.apache.flink.formats.parquet.ParquetColumnarRowInputFormat<SplitT>
-
- All Implemented Interfaces:
Serializable,org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>,org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT>,org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
public class ParquetColumnarRowInputFormat<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends ParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT> implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
AParquetVectorizedInputFormatto provideRowDataiterator. UsingColumnarRowDatato provide a row view of column batch.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
ParquetVectorizedInputFormat.ParquetReaderBatch<T>
-
-
Field Summary
-
Fields inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
hadoopConfig, isUtcTimestamp
-
-
Constructor Summary
Constructors Constructor Description ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType projectedType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)Constructor to create parquet format without extra fields.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
ParquetColumnarRowInputFormat<SplitT>createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType producedRowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)Create a partitionedParquetColumnarRowInputFormat, the partition columns can be generated byPath.protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>createReaderBatch(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors, org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler)org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>getProducedType()protected intnumBatchesToCirculate(org.apache.flink.configuration.Configuration config)org.apache.flink.table.plan.stats.TableStatsreportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)-
Methods inherited from class org.apache.flink.formats.parquet.ParquetVectorizedInputFormat
createReader, isSplittable, restoreReader
-
-
-
-
Constructor Detail
-
ParquetColumnarRowInputFormat
public ParquetColumnarRowInputFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType projectedType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)Constructor to create parquet format without extra fields.
-
-
Method Detail
-
numBatchesToCirculate
protected int numBatchesToCirculate(org.apache.flink.configuration.Configuration config)
- Overrides:
numBatchesToCirculatein classParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
createReaderBatch
protected ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData> createReaderBatch(org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector[] writableVectors, org.apache.flink.table.data.columnar.vector.VectorizedColumnBatch columnarBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<ParquetVectorizedInputFormat.ParquetReaderBatch<org.apache.flink.table.data.RowData>> recycler)
- Specified by:
createReaderBatchin classParquetVectorizedInputFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>- Parameters:
writableVectors- vectors to be writecolumnarBatch- vectors to be readrecycler- batch recycler
-
getProducedType
public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
- Specified by:
getProducedTypein interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>- Specified by:
getProducedTypein interfaceorg.apache.flink.api.java.typeutils.ResultTypeQueryable<SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
reportStatistics
public org.apache.flink.table.plan.stats.TableStats reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)
- Specified by:
reportStatisticsin interfaceorg.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
-
createPartitionedFormat
public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> ParquetColumnarRowInputFormat<SplitT> createPartitionedFormat(org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType producedRowType, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int batchSize, boolean isUtcTimestamp, boolean isCaseSensitive)
Create a partitionedParquetColumnarRowInputFormat, the partition columns can be generated byPath.
-
-