Class ParquetSplitReaderUtil

    • Constructor Detail

      • ParquetSplitReaderUtil

        public ParquetSplitReaderUtil()
    • Method Detail

      • genPartColumnarRowReader

        public static ParquetColumnarRowSplitReader genPartColumnarRowReader​(boolean utcTimestamp,
                                                                             boolean caseSensitive,
                                                                             org.apache.hadoop.conf.Configuration conf,
                                                                             String[] fullFieldNames,
                                                                             org.apache.flink.table.types.DataType[] fullFieldTypes,
                                                                             Map<String,​Object> partitionSpec,
                                                                             int[] selectedFields,
                                                                             int batchSize,
                                                                             org.apache.flink.core.fs.Path path,
                                                                             long splitStart,
                                                                             long splitLength)
                                                                      throws IOException
        Util for generating partitioned ParquetColumnarRowSplitReader.
        Throws:
        IOException
      • createVectorFromConstant

        public static org.apache.flink.table.data.columnar.vector.ColumnVector createVectorFromConstant​(org.apache.flink.table.types.logical.LogicalType type,
                                                                                                        Object value,
                                                                                                        int batchSize)
      • createColumnReader

        public static ColumnReader createColumnReader​(boolean isUtcTimestamp,
                                                      org.apache.flink.table.types.logical.LogicalType fieldType,
                                                      org.apache.parquet.schema.Type type,
                                                      List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors,
                                                      org.apache.parquet.column.page.PageReadStore pages,
                                                      ParquetField field,
                                                      int depth)
                                               throws IOException
        Throws:
        IOException
      • createWritableColumnVector

        public static org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector createWritableColumnVector​(int batchSize,
                                                                                                                           org.apache.flink.table.types.logical.LogicalType fieldType,
                                                                                                                           org.apache.parquet.schema.Type type,
                                                                                                                           List<org.apache.parquet.column.ColumnDescriptor> columnDescriptors,
                                                                                                                           int depth)
      • buildFieldsList

        public static List<ParquetField> buildFieldsList​(List<org.apache.flink.table.types.logical.RowType.RowField> childrens,
                                                         List<String> fieldNames,
                                                         org.apache.parquet.io.MessageColumnIO columnIO)
      • lookupColumnByName

        public static org.apache.parquet.io.ColumnIO lookupColumnByName​(org.apache.parquet.io.GroupColumnIO groupColumnIO,
                                                                        String columnName)
        Parquet's column names are case in sensitive. So when we look up columns we first check for exact match, and if that can not find we look for a case-insensitive match.
      • getMapKeyValueColumn

        public static org.apache.parquet.io.GroupColumnIO getMapKeyValueColumn​(org.apache.parquet.io.GroupColumnIO groupColumnIO)
      • getArrayElementColumn

        public static org.apache.parquet.io.ColumnIO getArrayElementColumn​(org.apache.parquet.io.ColumnIO columnIO)