Class NestedColumnReader

  • All Implemented Interfaces:
    ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>

    public class NestedColumnReader
    extends Object
    implements ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>
    This ColumnReader mainly used to read `Group` type in parquet such as `Map`, `Array`, `Row`. The method about how to resolve nested struct mainly refer to : The striping and assembly algorithms from the Dremel paper.

    Brief explanation of reading repetition and definition levels: Repetition level equal to 0 means that this is the beginning of a new row. Other value means that we should add data to the current row.

    For example, if we have the following data: repetition levels: 0,1,1,0,0,1,[0] (last 0 is implicit, normally will be the end of the page) values: a,b,c,d,e,f will consist of the sets of: (a, b, c), (d), (e, f).

    Definition levels contains 3 situations: level = maxDefLevel means value exist and is not null level = maxDefLevel - 1 means value is null level < maxDefLevel - 1 means value doesn't exist For non-nullable (REQUIRED) fields the (level = maxDefLevel - 1) condition means non-existing value as well.

    Quick example (maxDefLevel is 2): Read 3 rows out of: repetition levels: 0,1,0,1,1,0,0,... definition levels: 2,1,0,2,1,2,... values: a,b,c,d,e,f,... Resulting buffer: a,n, ,d,n,f that result is (a,n),(d,n),(f) where n means null

    • Constructor Detail

      • NestedColumnReader

        public NestedColumnReader​(boolean isUtcTimestamp,
                                  org.apache.parquet.column.page.PageReadStore pages,
                                  ParquetField field)
    • Method Detail

      • readToVector

        public void readToVector​(int readNumber,
                                 org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector vector)
                          throws IOException
        Specified by:
        readToVector in interface ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>
        Parameters:
        readNumber - number to read.
        vector - vector to write.
        Throws:
        IOException