Class NestedColumnReader
- java.lang.Object
-
- org.apache.flink.formats.parquet.vector.reader.NestedColumnReader
-
- All Implemented Interfaces:
ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>
public class NestedColumnReader extends Object implements ColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>
This ColumnReader mainly used to read `Group` type in parquet such as `Map`, `Array`, `Row`. The method about how to resolve nested struct mainly refer to : The striping and assembly algorithms from the Dremel paper.Brief explanation of reading repetition and definition levels: Repetition level equal to 0 means that this is the beginning of a new row. Other value means that we should add data to the current row.
For example, if we have the following data: repetition levels: 0,1,1,0,0,1,[0] (last 0 is implicit, normally will be the end of the page) values: a,b,c,d,e,f will consist of the sets of: (a, b, c), (d), (e, f).
Definition levels contains 3 situations: level = maxDefLevel means value exist and is not null level = maxDefLevel - 1 means value is null level < maxDefLevel - 1 means value doesn't exist For non-nullable (REQUIRED) fields the (level = maxDefLevel - 1) condition means non-existing value as well.
Quick example (maxDefLevel is 2): Read 3 rows out of: repetition levels: 0,1,0,1,1,0,0,... definition levels: 2,1,0,2,1,2,... values: a,b,c,d,e,f,... Resulting buffer: a,n, ,d,n,f that result is (a,n),(d,n),(f) where n means null
-
-
Constructor Summary
Constructors Constructor Description NestedColumnReader(boolean isUtcTimestamp, org.apache.parquet.column.page.PageReadStore pages, ParquetField field)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidreadToVector(int readNumber, org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector vector)
-
-
-
Constructor Detail
-
NestedColumnReader
public NestedColumnReader(boolean isUtcTimestamp, org.apache.parquet.column.page.PageReadStore pages, ParquetField field)
-
-
Method Detail
-
readToVector
public void readToVector(int readNumber, org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector vector) throws IOException- Specified by:
readToVectorin interfaceColumnReader<org.apache.flink.table.data.columnar.vector.writable.WritableColumnVector>- Parameters:
readNumber- number to read.vector- vector to write.- Throws:
IOException
-
-